Re: [HACKERS] Proposal for GUID datatype
On Sat, Sep 09, 2006 at 01:03:24AM -0400, Jan de Visser wrote: > On Saturday 09 September 2006 00:42, [EMAIL PROTECTED] wrote: > > But if the input isn't 32 hexadecimal characters - I don't see how > > it fits the UUID/GUID type. > Again, it wasn't about that particular *value* (which, as I > concurred, is not a [GU]UID). It was about the fact that different > tools spit out stuff in different formats, and that it would suck > royally if you would have to convert because Gevik didn't think of > the particular format of your particular tool. > But I guess your suggestion of just ignoring punctuation would work > just fine. I don't think so. If it isn't 128 bits - and you want to fit it into 128 bits, it means padding. Where should the padding go? As application specific, it is up to the application to convert. I don't agree that a GUID/UUID type should do much more than allow a GUID/UUID to fit into it. It is not a type that can be used to represent any UID you happen to have. With a 64 bit UID or 96 bit UID - why would you want to store it in 128-bits anyways? This would be a waste of space. For the same reason I wouldn't want to store an MD5SUM into a UUID type. It's an abuse of the type. If the type was called hexstring32 or something like that, sure. Philosophy is involved. Making UUID understand every possible format, and know how to pad each format that doesn't use 32 hexadecimal characters is outside the scope of this discussion, and guarantees that it will never be implemented. There will always be one more format somebody wants to cram into it. UUID is not a UID. UUID is a 128-bit number with a fairly specific format that includes a version number, and each of the bit segments within it are used to signify a different type of data, based upon the version number. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal for GUID datatype
Jan de Visser wrote: > On Saturday 09 September 2006 00:42, [EMAIL PROTECTED] wrote: > > But if the input isn't 32 hexadecimal characters - I don't see how > > it fits the UUID/GUID type. > > Again, it wasn't about that particular *value* (which, as I concurred, is not > a [GU]UID). It was about the fact that different tools spit out stuff in > different formats, and that it would suck royally if you would have to > convert because Gevik didn't think of the particular format of your > particular tool. > > But I guess your suggestion of just ignoring punctuation would work just fine. Would it? The output you showed was 3b732da7:10d9029b3eb:-8000 What to do with the :-8000 part? Do you discard the whole :-8000? Do you discard the :- and keep the 8000? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal for GUID datatype
On Saturday 09 September 2006 00:42, [EMAIL PROTECTED] wrote: > But if the input isn't 32 hexadecimal characters - I don't see how > it fits the UUID/GUID type. Again, it wasn't about that particular *value* (which, as I concurred, is not a [GU]UID). It was about the fact that different tools spit out stuff in different formats, and that it would suck royally if you would have to convert because Gevik didn't think of the particular format of your particular tool. But I guess your suggestion of just ignoring punctuation would work just fine. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal for GUID datatype
On Fri, Sep 08, 2006 at 10:49:21PM -0400, Jan de Visser wrote: > On Friday 08 September 2006 21:34, [EMAIL PROTECTED] wrote: > > On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote: > > > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > > > > 2a) Three input formats are supported. > > > > example: > > > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > > > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > > > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > > > > > > Please extend your list. java's RMI package generates UIDs with the > > > following format: > > > > > > [head order 21:19]$ bsh.sh > > > BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED]) > > > bsh % print(new java.rmi.server.UID()); > > > 3b732da7:10d9029b3eb:-8000 > > > bsh % > > That doesn't look like a UUID/GUID - unless it trims leading zeroes? > > UUID/GUID = 128-bits = 32 hexadecimal characters. > I know, it's a UID. Not a GUID. But many people (including myself) > use it as a base to generate GUIDs. And that's not really the > point. The point is that I'm afraid he's going to restrict himself > to much. I'm not sure what it would do with this input though. Where do the bits go in a fixed 128-bit field? I think with and with dashes is fine. { and } starts to get into the unnecessary. At this point, it may as well ignore all punctuation characters, which stays fine. Less error checking required. But if the input isn't 32 hexadecimal characters - I don't see how it fits the UUID/GUID type. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module
"Jeremy Kronuz" <[EMAIL PROTECTED]> writes: >> I'm reviewing this for addition to contrib/ now. I notice that there is >> no clear license statement. Is it OK to put the following into the >> README file? > Tom, yes. Also, I just put copyright in the files that contain a significant > amount of code written by me. (i.e. isn.h and isn.c). Excellent, thanks. > Also, did you get the 'isn-1_0_beta_20060924' version from my ftp? 'cause > thats the last one I updated (2006-09-24). It's at: > ftp://ftp.kronuz.com/pub/programming/isn-1_0_beta_20060924.tar.bz2 (just in > case) You had submitted two slightly different versions to the mailing list awhile back. I took the one that seemed to have later file dates and did some fixes/editorializations on that. Please look at what I've just committed to PG CVS and see if you want to make any adjustments --- if so, submit a patch through the usual pgsql-patches channel. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module
"Jeremy Kronuz" <[EMAIL PROTECTED]> writes: > Hello again,This is an usable version of my EAN13/UPC/ISBN module. I'm reviewing this for addition to contrib/ now. I notice that there is no clear license statement. Is it OK to put the following into the README file? EAN13 - UPC - ISBN (books) - ISMN (music) - ISSN (serials) -- Copyright Germán Méndez Bravo (Kronuz), 2004 - 2006 This module is released under the same BSD license as the rest of PostgreSQL. The information to implement this module was collected through ... Tom, yes. Also, I just put copyright in the files that contain a significant amount of code written by me. (i.e. isn.h and isn.c). If you check the other files, I just put the information was recompiled from several sources and I also included the websites from where the information came. I suppose I just forgot to put the BSD license legend there in the code, so please add the line you suggested about the PostgreSQL BSD license if you want. Also, did you get the 'isn-1_0_beta_20060924' version from my ftp? 'cause thats the last one I updated (2006-09-24). It's at: ftp://ftp.kronuz.com/pub/programming/isn-1_0_beta_20060924.tar.bz2 (just in case) Kronuz. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries
"Albe Laurenz" <[EMAIL PROTECTED]> writes: > Here is a new patch that replaces the previous one; it adds two > macros LDAP_LIBS_FE and LDAP_LIBS_BE for frontend and backend, > respectively. > I did not only add them to the Makefile for interfaces/libpq, > but also everywhere something is linked against libpq in case > somebody links static. Applied, but without that last part. It builds OK for me on Darwin, which is moderately picky about that sort of thing, but someone should try AIX. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal for GUID datatype
On Friday 08 September 2006 21:34, [EMAIL PROTECTED] wrote: > On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote: > > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > > > 2a) Three input formats are supported. > > > example: > > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > > > > Please extend your list. java's RMI package generates UIDs with the > > following format: > > > > [head order 21:19]$ bsh.sh > > BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED]) > > bsh % print(new java.rmi.server.UID()); > > 3b732da7:10d9029b3eb:-8000 > > bsh % > > That doesn't look like a UUID/GUID - unless it trims leading zeroes? > > UUID/GUID = 128-bits = 32 hexadecimal characters. I know, it's a UID. Not a GUID. But many people (including myself) use it as a base to generate GUIDs. And that's not really the point. The point is that I'm afraid he's going to restrict himself to much. > > Cheers, > mark jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Build v8.1.4 with VC++ 2005
On 9/8/06, Xiaofeng Zhao <[EMAIL PROTECTED]> wrote: I tried to build postgresql from src using vc++ 2005 but was not able to go far before hitting errors related to inline function in the "wchar.c" file. The source file I downloaded is "postgresql-8.1.4.tar.gz". I tried to build it by runing "nmake -f win32.mak" in the "src" directory. Can v8.1.4 be build using VC++ 2005? If so, what is the build procedure and if any patches is required. there is a patch in the queue which has a decent chance of making it into 8.2. from the stock 8.1 sources it is impossible. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Build v8.1.4 with VC++ 2005
I tried to build postgresql from src using vc++ 2005 but was not able to go far before hitting errors related to inline function in the "wchar.c" file. The source file I downloaded is "postgresql-8.1.4.tar.gz". I tried to build it by runing "nmake -f win32.mak" in the "src" directory. Can v8.1.4 be build using VC++ 2005? If so, what is the build procedure and if any patches is required. Thanks!
Re: [HACKERS] Proposal for GUID datatype
On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote: > On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > > 2a) Three input formats are supported. > > example: > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); > Please extend your list. java's RMI package generates UIDs with the > following format: > [head order 21:19]$ bsh.sh > BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED]) > bsh % print(new java.rmi.server.UID()); > 3b732da7:10d9029b3eb:-8000 > bsh % That doesn't look like a UUID/GUID - unless it trims leading zeroes? UUID/GUID = 128-bits = 32 hexadecimal characters. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal for GUID datatype
On Friday 08 September 2006 15:18, Gevik Babakhani wrote: > 2a) Three input formats are supported. > example: > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); Please extend your list. java's RMI package generates UIDs with the following format: [head order 21:19]$ bsh.sh BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED]) bsh % print(new java.rmi.server.UID()); 3b732da7:10d9029b3eb:-8000 bsh % So forms that use colons instead of dashes seem appropriate. Or better still, make it configurable. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Domains and subtypes, a brief proposal
On Fri, Sep 08, 2006 at 05:20:18PM -0400, Tom Lane wrote: > elein <[EMAIL PROTECTED]> writes: > > I think what you are saying is that the domain checking (proposed constraint > > existence checking) would need to be done in more places and I'm not sure I > > understand this. > > What I'm complaining about is the need to search the catalogs to see if > a datatype has constraints. At the moment we need to do that only > for operations yielding domain types. Your proposal appears to require > that it be done for *every* operation on *every* datatype, right down > to int4 and bool (which at the very least creates some interesting > circularity issues). I'm not willing to accept that much overhead on > the strength of what is frankly a pretty weak case. If you want a > constraint, what's wrong with putting a domain on your base type to > enforce it? > > > And checking for constraint <> NULL should be equivalent to the > > current check *typtype != 'd'. > > Not without an amazingly complicated substructure to the "constraint" > column (multiple entries, names as well as expressions, etc). At the > very least that's a violation of relational theory, and I'm not sure how > we're going to handle dependencies of the constraint expressions at all > if they aren't separate catalog entries. > > regards, tom lane I'm seeing the constraint column as an attribute of the type. It would at least flag existence of a constraint on a type. This is necessary for domains as they work now (but we check typtype). This would also be a catalog change, i.e. non-trivial. It would link types to constraints only and replace the 'd' value of typtype. Perhaps my ignorance is showing and I'm missing something. How you have the ability to select the constraint from the domain name now is fuzzy to me. But I'm trying to move "domains" from a type of type to just an attribute of type with the goal of simplifying the behaviour. The single value result of an expression, on assignment is the only candidate for constraint checking. Other expression evaluation would behave as is until assignment to a final result. Theoretically, you would not have to add constraints other than those defined now. And this should already be working though the implementation would change slightly with my proposal. elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 04:49:22PM -0400, Andrew Dunstan wrote: > [EMAIL PROTECTED] wrote: > >Only ASCII values store more space efficiently in UTF-8. All values > >over 127 store more space efficiently using UTF-16. > This second statement is demonstrably not true. Only values above 0x07ff > require more than 2 bytes in UTF-8. All chars up to that point are > stored in UTF-8 with greater or equal efficiency than that of UTF-16. > See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html You are correct - I should have said "All values over 127 store at least as space efficiently using UTF-16 as UTF-8." >From the ICU page: "Most of the time, the memory throughput of the hard drive and RAM is the main performance constraint. UTF-8 is 50% smaller than UTF-16 or US-ASCII, but UTF-8 is 50% larger than UTF-16 or East and South Asian scripts. There is no memory difference for Latin extensions, Greek, Cyrillic, Hebrew, and Arabic. For processing Unicode data, UTF-16 is much easier to handle. You get a choice between either one or two units per character, not a choice among four lengths. UTF-16 also does not have illegal 16-bit unit values, while you might want to check or illegal bytes in UTF-8. Incomplete character sequences in UTF-16 are less important and more benign. If you want to quickly convert small strings between the different UTF encodings or get a UChar32 value, you can use the macros provided in utf.h and ..." I didn't think of the iterators for simple uses. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Mapping arbitriary and heirachical XML to tuple
I have a system by which I store complex data in PostgreSQL as an XML string. I have a simple function that can return a single value. I would like to return sets and sets of rows from the data. This is not a huge problem, as I've written a few of these functions. The question I'd like to put out there, is how would you represent heirarchical data as: 0 1.00 2.5 3.5 5 10 1 1.10 2.2 3.53 3 9 The biggest problem with XML is storing data is easy, getting it back out in a sane way is less so. How would you guys think to represent this? (Obviously, this is a bogus example, real life would be much worse!) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Domains and subtypes, a brief proposal
elein <[EMAIL PROTECTED]> writes: > I think what you are saying is that the domain checking (proposed constraint > existence checking) would need to be done in more places and I'm not sure I > understand this. What I'm complaining about is the need to search the catalogs to see if a datatype has constraints. At the moment we need to do that only for operations yielding domain types. Your proposal appears to require that it be done for *every* operation on *every* datatype, right down to int4 and bool (which at the very least creates some interesting circularity issues). I'm not willing to accept that much overhead on the strength of what is frankly a pretty weak case. If you want a constraint, what's wrong with putting a domain on your base type to enforce it? > And checking for constraint <> NULL should be equivalent to the > current check *typtype != 'd'. Not without an amazingly complicated substructure to the "constraint" column (multiple entries, names as well as expressions, etc). At the very least that's a violation of relational theory, and I'm not sure how we're going to handle dependencies of the constraint expressions at all if they aren't separate catalog entries. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 04:49:49PM -0400, Alvaro Herrera wrote: > Actually he muttered something about iterators, and not needing to > convert anything. Yes, many of the useful functions accept strings in two forms, either UTF-16 or CharacterIterators. The iterator pretty much only has to know how to step forward through the string and return the code point at each point. Here's the docs for C++ class, but there's a equivalent C interface. http://icu.sourceforge.net/apiref/icu4c/classCharacterIterator.html#_details Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Domains and subtypes, a brief proposal
On Fri, Sep 08, 2006 at 03:47:23PM -0400, Tom Lane wrote: > elein <[EMAIL PROTECTED]> writes: > > a) if subtypes/domains can have constraints then the model should > >not be different for domains only but for all types. Constraint > >checking would only > >occur at check constraint points--and there for any type. You > >already check for the existance of a domain. Change that test > >to the existence of a constraint only and eliminate domain specific > >code. > > Au contraire, the test whether a constraint actually exists occurs at > runtime, not at the time we check for domain-ness. Your proposal would > force such checks to be introduced into every single expression > evaluation. It's not feasible at all without plan invalidation, and > even with that I foresee fairly enormous added overhead. Our experience > with domains so far is that looking up those constraints is *expensive*. For domain checking isn't expression evaluation required anyway? email := email_value || email_value should fail on a constraint check for result value at assignment time. I think what you are saying is that the domain checking (proposed constraint existence checking) would need to be done in more places and I'm not sure I understand this. I believe constraints checking should done less often than input types. And checking for constraint <> NULL should be equivalent to the current check *typtype != 'd'. I could be wrong base on the current implementation. There may be more to it, but I suspect making sure the constraint value is available when you fetch a type would be necessary. Turn the thing around a bit. The contraint is an attribute on anytype. All type code, only where appropriate, should check for existence of the constraint attribute. This is different from saying domains as special types and need special casing in places (other than constraint checking). I'm trying to remove the specialness from domains so that the type code can pretty well work as is in all places execpt checking for the constraint attribute of a type. This should solve some of the existing domain problems. We're pretty close to this as is, but there is still a lot of special casing going on. The ability to add constraints to any type should only be considered as a logical extension made easier by the change in what you test when you test for constraints. elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration is redundant, no?
"Guillaume Smet" <[EMAIL PROTECTED]> writes: > Do you want me to propose a patch or do you prefer to work on it > yourself? It's done already ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration is redundant, no?
Tom, On 9/8/06, Tom Lane <[EMAIL PROTECTED]> wrote: It seems like we should either remove the separate log_duration boolean or make it work as he suggests. I'm leaning to the second answer now. Do you want me to propose a patch or do you prefer to work on it yourself? If so, do we keep the log_duration name or do we change it to log_all_duration or another more appropriate name? I attached the little patch I use to apply on our packages. I can work on it to make it apply to HEAD and update the documentation. I suppose we should also change the FE/BE protocol logging accordingly but ISTM you already planned to change it for other reasons. -- Guillaume Index: src/backend/tcop/postgres.c === RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.483 diff -u -r1.483 postgres.c --- src/backend/tcop/postgres.c 4 Apr 2006 19:35:35 - 1.483 +++ src/backend/tcop/postgres.c 6 Apr 2006 12:12:00 - @@ -1092,14 +1092,6 @@ usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100 + (long) (stop_t.tv_usec - start_t.tv_usec); - /* Only print duration if we previously printed the statement. */ - if (was_logged && save_log_duration) - ereport(LOG, - (errmsg("duration: %ld.%03ld ms", - (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + - (stop_t.tv_usec - start_t.tv_usec) / 1000), - (long) (stop_t.tv_usec - start_t.tv_usec) % 1000))); - /* * Output a duration_statement to the log if the query has exceeded * the min duration, or if we are to print all durations. @@ -1107,6 +1099,7 @@ if (save_log_min_duration_statement == 0 || (save_log_min_duration_statement > 0 && usecs >= save_log_min_duration_statement * 1000)) + { ereport(LOG, (errmsg("duration: %ld.%03ld ms statement: %s%s", (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + @@ -1114,6 +1107,17 @@ (long) (stop_t.tv_usec - start_t.tv_usec) % 1000, query_string, prepare_string ? prepare_string : ""))); + } + else + { + /* Print duration if we did not print it before. */ + if (save_log_duration) +ereport(LOG, + (errmsg("duration: %ld.%03ld ms", +(long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + + (stop_t.tv_usec - start_t.tv_usec) / 1000), + (long) (stop_t.tv_usec - start_t.tv_usec) % 1000))); + } } if (save_log_statement_stats) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
[EMAIL PROTECTED] wrote: Only ASCII values store more space efficiently in UTF-8. All values over 127 store more space efficiently using UTF-16. This second statement is demonstrably not true. Only values above 0x07ff require more than 2 bytes in UTF-8. All chars up to that point are stored in UTF-8 with greater or equal efficiency than that of UTF-16. See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
[EMAIL PROTECTED] wrote: > On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > > But Martijn already clarified that ICU does not actually force you to > > switch everything to UTF-16, so this is not an issue anyway. > > If my memory is correct, it does this by converting it to UTF-16 first. > This is a performance disadvantage (although it may not be worse than > PostgreSQL's current implementation :-) ). Actually he muttered something about iterators, and not needing to convert anything. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > [EMAIL PROTECTED] wrote: > > The authors of the library in question? Java? Anybody whose primary > > alphabet isn't LATIN1 based? :-) > Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than > UTF-8. That covers a lot of the world. Forcing those people to change > to UTF-16 does not strike me as a very good idea. Ah. Thought you were talking UTF-8 vs UTF-16. > But Martijn already clarified that ICU does not actually force you to > switch everything to UTF-16, so this is not an issue anyway. If my memory is correct, it does this by converting it to UTF-16 first. This is a performance disadvantage (although it may not be worse than PostgreSQL's current implementation :-) ). > > Only ASCII values store more space efficiently in UTF-8. All values > > over 127 store more space efficiently using UTF-16. UTF-16 is easier > > to process. UTF-8 requires too many bit checks with single character > > offsets. I'm not an expert - I had this question before a year or two > > ago, and read up on the ideas of experts. > Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs. > UTF-16". I don't much like UTF-8 myself, but that's not a very informed > opinion, just like a feeling of "fly-killing-cannon" (when it's used to > store Latin-9-fitting text). *nod* Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal for GUID datatype
On Fri, 2006-09-08 at 16:17 -0400, Tom Lane wrote: > Gevik Babakhani <[EMAIL PROTECTED]> writes: > > typreceive = not supported > > typsend = not supported > > Really? Why not? You are right, typreceive/typsend are also needed. How would you advice to test this? > I would suggest that the default output format just be 32 hex > characters, since that would render the type useful for purposes > other than one narrow definition of UUID. Agreed. > > regards, tom lane > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
[EMAIL PROTECTED] wrote: > On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: > > [EMAIL PROTECTED] wrote: > > > I think I've been involved in a discussion like this in the past. Was > > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > > > means that UTF-8 applications are at a disadvantage when using the > > > library. UTF-16 is considered more efficient to work with for everybody > > > except ASCII users. :-) > > Uh, is it? By whom? And why? > > The authors of the library in question? Java? Anybody whose primary > alphabet isn't LATIN1 based? :-) Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than UTF-8. That covers a lot of the world. Forcing those people to change to UTF-16 does not strike me as a very good idea. But Martijn already clarified that ICU does not actually force you to switch everything to UTF-16, so this is not an issue anyway. > Only ASCII values store more space efficiently in UTF-8. All values > over 127 store more space efficiently using UTF-16. UTF-16 is easier > to process. UTF-8 requires too many bit checks with single character > offsets. I'm not an expert - I had this question before a year or two > ago, and read up on the ideas of experts. Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs. UTF-16". I don't much like UTF-8 myself, but that's not a very informed opinion, just like a feeling of "fly-killing-cannon" (when it's used to store Latin-9-fitting text). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: > [EMAIL PROTECTED] wrote: > > I think I've been involved in a discussion like this in the past. Was > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > > means that UTF-8 applications are at a disadvantage when using the > > library. UTF-16 is considered more efficient to work with for everybody > > except ASCII users. :-) > Uh, is it? By whom? And why? The authors of the library in question? Java? Anybody whose primary alphabet isn't LATIN1 based? :-) Only ASCII values store more space efficiently in UTF-8. All values over 127 store more space efficiently using UTF-16. UTF-16 is easier to process. UTF-8 requires too many bit checks with single character offsets. I'm not an expert - I had this question before a year or two ago, and read up on the ideas of experts. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal for GUID datatype
Just a few comments, On Fri, Sep 08, 2006 at 09:18:20PM +0200, Gevik Babakhani wrote: > 5) support functions: > because uuid could also be used as PK or unique values, additional > function(s) will be available to produce a uuid value to be used in > a field's default value like sequences or PL/pgSQL etc.. etc... > > example; > > create table tbl( > ID uuid default ('new_uuid()'), > > ); That would be: ID uuid default new_uuid(); > typstorage = m // stored compressed inline Compression is not going to work on such short values, the header will almost longer, just use 'p' like every other fixed length type. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Proposal for GUID datatype
Gevik Babakhani <[EMAIL PROTECTED]> writes: > typreceive = not supported > typsend = not supported Really? Why not? I would suggest that the default output format just be 32 hex characters, since that would render the type useful for purposes other than one narrow definition of UUID. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal for GUID datatype
Martijn van Oosterhout wrote: Just a few comments, On Fri, Sep 08, 2006 at 09:18:20PM +0200, Gevik Babakhani wrote: 5) support functions: because uuid could also be used as PK or unique values, additional function(s) will be available to produce a uuid value to be used in a field's default value like sequences or PL/pgSQL etc.. etc... example; create table tbl( ID uuid default ('new_uuid()'), ); That would be: ID uuid default new_uuid(); typstorage = m // stored compressed inline Compression is not going to work on such short values, the header will almost longer, just use 'p' like every other fixed length type. Have a nice day, Point taken, thank you. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > No one has mentioned that we page value on disk to match the CPU > > alignment. This is done for efficiency, but is not strictly required. > > Well, it is unless you are willing to give up support of non-Intel CPUs; > most other popular chips are strict about alignment, and will fail an > attempt to do a nonaligned fetch. > > The only way we could pack stuff without alignment is to go over to the > idea that memory and disk representations are different --- where in > this case the "conversion" might just be a memcpy to a known-aligned > location. The performance costs of that seem pretty daunting, however, > especially when you reflect that simply stepping over a varlena field > would require memcpy'ing its length word to someplace. Agreed, but I thought I would point it out. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries
On Fri, Sep 08, 2006 at 03:20:00PM -0400, Tom Lane wrote: > "Albe Laurenz" <[EMAIL PROTECTED]> writes: > > I did not only add them to the Makefile for interfaces/libpq, > > but also everywhere something is linked against libpq in case > > somebody links static. > If we do need to start mentioning all of libpq's dependencies everywhere > it's linked, I think it's time for a generic solution to that, instead > of hacking each such place over again every time a new dependency pops up. The business of having to include every single dependancy when linking static is quite irritating. It has almost reached the point where people are just giving up static linking because it's too much of a pain. However, if we do want to support it, the way you do it is by extending pg_config to do something like: pg_config --dynamic-lick => returns -lpq pg_config --static-link=> returns -lpq That way only people who actually want static linking need be bothered. Have a ncie day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Domains and subtypes, a brief proposal
elein <[EMAIL PROTECTED]> writes: > a) if subtypes/domains can have constraints then the model should >not be different for domains only but for all types. Constraint >checking would only >occur at check constraint points--and there for any type. You >already check for the existance of a domain. Change that test >to the existence of a constraint only and eliminate domain specific >code. Au contraire, the test whether a constraint actually exists occurs at runtime, not at the time we check for domain-ness. Your proposal would force such checks to be introduced into every single expression evaluation. It's not feasible at all without plan invalidation, and even with that I foresee fairly enormous added overhead. Our experience with domains so far is that looking up those constraints is *expensive*. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Domains and subtypes, a brief proposal
On Fri, Sep 08, 2006 at 02:33:13PM -0400, Tom Lane wrote: > elein <[EMAIL PROTECTED]> writes: > > Domains and subtypes. > > >* Create new child type from values in parent type. > >* Maintain only checks for constraints > >* Create implicit casts from child to parent > > This seems a bit content-free, because it's not clear how it differs > from what we do now. We already have implicit child-to-parent casts. I guess the key point here was to treat the domains as proper udt types except where constraint checking is required. And yes, this is already done, but it needed to be included for context. > > > Constraints on types: > > >* Change the pg_types to hold a NULLABLE constraint text column > > OR add a type constraint lookup table (pg_domains?) > > I understand that you are arguing to allow constraints to be associated > with any type not only domains, but > (a) I don't see why we should want to add that overhead, and > (b) I don't see what that has to do with the problem you actually > need to solve, specifically limiting the application of implicit > domain-to-base-type casts. This is a new feature idea, derived from the implementation of domains. Usually people store type checking in the input functions, but this is a nice addition to UDTs that require a constraint checking model. It allows the constraints to be in plperl which is nice for parsing complex object stored at strings. (My example was email and the constraint was a plperl function that validated the format and legitimacy of the value.) Other complex objects (stored as strings) such as key value lists and your ordinary weirdly constructed values can use the (more expensive) constraint at constraint time only instead of the input function which should remain fast and may be a borrowed or inherited input function. This is not a drop dead required feature but it should flow from the cleaner implementation of domains. Changing the check from domain type to constraint exists on any type should be cleaner. Changing the SQL for CREATE TYPE should be the added work to get this feature available. It just seems simpler and cleaner. We want to treat all types the same and maintain a type blind database server. a) if subtypes/domains can have constraints then the model should not be different for domains only but for all types. Constraint checking would only occur at check constraint points--and there for any type. You already check for the existance of a domain. Change that test to the existence of a constraint only and eliminate domain specific code. b) It is not part of the problem but a logical stretch given the changes required. It will also reduce the domain checking. Other than my existing tests (published previously) I do not have a good idea code wise the extent of the changes. This discussion may help us get to that point. > > regards, tom lane > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster > elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote: > > > But that won't help in the example you posted upthread, because char(N) > > > is not fixed-length. > > > > It can be fixed-length, or at least, have an upper bound. If marked > > up to contain only ascii characters, it doesn't, at least in theory, > > and even if it is unicode, it's not going to need more than 4 bytes > > per character. char(2) through char(16) only require 4 bits to > > store the length header, leaving 4 bits for encoding information. > > bytea(2) through bytea(16), at least in theory, should require none. > > If your talking about an upper-bound, then it's not fixed length > anymore, and you need to expend bytes storing the length. ASCII bytes > only take one byte in most encodings, include UTF8. > > Doodling this morning I remember why the simple approach didn't work. > If you look at the varlena header, 2 bits are reserved. Say you take > one bit to indicate "short header". Then lengths 0-31 bytes can be > represented with a one byte header, yay! > > However, now you only have enough bits leftover to store 29 bits for > the length, so we've just cut the maximum datum size from 1GB to 512MB. > Is that a fair trade? Probably not, so you'd need a more sophisticated > scheme. I was hoping we could have both bits true mean short header, but that is also used by our system to indicate compressed and TOAST usage. For testing, I would just grab a bit and see how thing go. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries
"Albe Laurenz" <[EMAIL PROTECTED]> writes: > I did not only add them to the Makefile for interfaces/libpq, > but also everywhere something is linked against libpq in case > somebody links static. I intensely dislike that part of the patch, but will work on applying the rest. If we do need to start mentioning all of libpq's dependencies everywhere it's linked, I think it's time for a generic solution to that, instead of hacking each such place over again every time a new dependency pops up. But at the moment I'm unconvinced that we need to do it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Proposal for GUID datatype
Folks, I would like to submit the following proposal regarding the implementation of the GUID datatype. Based on the findings, thoughts and the discussion we have had in the past, I am going to propose the following: 1) Datatype name would be "uuid" or "guid". example: create table tbl (fld uuid, fld2 ); 2) Accepted input/output datatype and formats: The input/output datatype would be string(36) 2a) Three input formats are supported. example: insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce'); insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}'); insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce'); 2b) Only one default output format is supported. example: # select fld from tbl; fld --+ 1dfb39af-b56a-40b8-a903-b5b31567c3ce | 2b.a) An additional function will be available to provide other output formats or an existing function like to_char will support the additional formatting options. 3) Internal datatype Because there is not going to be any kind of (mathematically meaningful) calculation on the guid values, the internal datatype would be just a simple 16 byte unsigned char (uint8). This would help when comparing these values and can also be compressed inline Proposed data structure would be: typedef struct uuid_t { char data[16]; } uuid_t; 4) Comparing functions and operators The default comparing functions and operators like = < != > etc, etc.. would be implemented as required. Note that guid >= guid would not mean anything. The values will internally be compared as strings. 5) support functions: because uuid could also be used as PK or unique values, additional function(s) will be available to produce a uuid value to be used in a field's default value like sequences or PL/pgSQL etc.. etc... example; create table tbl( ID uuid default ('new_uuid()'), ); 5.a) If needed an additional macro-type like SERIAL could also be developed in later stage. 6) pg_type layout: typname = uuid typnamespace = pg_catalog typowner = (default) // db owner typlen = 16 typbyval = FALSE // type is byref typtype = b // built-in type typisdefiled = true typdelim = ',' // ',' seperator for array of uuid typrelid = 0 typelem = 0 typinput = to be defined later typoutput = to be defined later typreceive = not supported typsend = not supported typanalyze = 0 // default analyze typalign = c typstorage = m // stored compressed inline typnotnull = false // can be null other pg_type attributes are set to default values. Please send your comments and suggestions to complete or modify this proposal. Regards, Gevik ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
Bruce Momjian wrote: No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. From time to time the idea of a logical vs physical mapping for columns has been mentioned. Among other benefits, that might allow us to do some rearrangement of physical ordering to reduce space wasted on alignment in some cases. There might be a small addition on computation required, but I suspect it would be lost in the noise, and swamped by any increased efficiency we got from putting more tuples in a page. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixed length data types issue
Bruce Momjian <[EMAIL PROTECTED]> writes: > No one has mentioned that we page value on disk to match the CPU > alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. The only way we could pack stuff without alignment is to go over to the idea that memory and disk representations are different --- where in this case the "conversion" might just be a memcpy to a known-aligned location. The performance costs of that seem pretty daunting, however, especially when you reflect that simply stepping over a varlena field would require memcpy'ing its length word to someplace. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length data types issue
Gregory Stark wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > Gregory Stark wrote: > > > But I think this is a dead-end route. What you're looking at is the > > > number "1" > > > repeated for *every* record in the table. And what your proposing amounts > > > to > > > noticing that the number "4" fits in a byte and doesn't need a whole word > > > to > > > store it. Well sure, but you don't even need a byte if it's going to be > > > the > > > same for every record in the table. > > > > > > If someone popped up on the list asking about whether Postgres compressed > > > their data efficiently if they stored a column that was identical > > > throughout > > > the whole table you would tell them to normalize their data. > > > > I am confused. You don't want to shrink the header but instead compress > > duplicate values in the same row to a single entry? > > I think we have to find a way to remove the varlena length header entirely for > fixed length data types since it's going to be the same for every single > record in the table. What fixed-length data type has a header? > It might be useful to find a way to have 1-byte or 2-byte length headers too > since I suspect most legitimately variable columns like text or array[] are > also gong to be under 256 bytes. I think the point you are making is that fixed length fields, like GUID, don't need a header, while short fields like VARCHAR() and NUMERIC() need some shorter header. No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Query returning tuples that does not satisfy the WHERE clause
Manuel Sugawara writes: > I don't think so, search_path has its default value and also I'm able > to reproduce it in a fresh cluster (tried 3 different machines to > discard hardware problems). Hm, well I'm willing to take a look if you can provide me access to the problem database running on a debug-enabled Postgres build. The fresh-cluster test seems to rule out my other idea about a corrupt index (though that was shaky anyway considering both plans use the same index...) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
[EMAIL PROTECTED] wrote: > I think I've been involved in a discussion like this in the past. Was > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > means that UTF-8 applications are at a disadvantage when using the > library. UTF-16 is considered more efficient to work with for everybody > except ASCII users. :-) Uh, is it? By whom? And why? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Ah, you're right, I did misunderstand that. However, it's still > apparently the case that ICU works mostly with UTF16 and handles other > encodings only via conversion to UTF16. That's a pretty serious > mismatch with our needs --- we'll end up converting to UTF16 all the > time. We're certainly not going to change to using UTF16 as the actual > native string representation inside the backend, both because of the > space penalty and incompatibility with tools like bison. No need to do anything like that. We'd probably use the u_strCompareIter() interface, where the two strings are defined as iterators. We setup the iterator to understand whatever charset postgres is currently running. Many of the other function have iterator versions also, so you can avoid UTF-16 entirely if you like. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Domains and subtypes, a brief proposal
elein <[EMAIL PROTECTED]> writes: > Domains and subtypes. >* Create new child type from values in parent type. >* Maintain only checks for constraints >* Create implicit casts from child to parent This seems a bit content-free, because it's not clear how it differs from what we do now. We already have implicit child-to-parent casts. > Constraints on types: >* Change the pg_types to hold a NULLABLE constraint text column > OR add a type constraint lookup table (pg_domains?) I understand that you are arguing to allow constraints to be associated with any type not only domains, but (a) I don't see why we should want to add that overhead, and (b) I don't see what that has to do with the problem you actually need to solve, specifically limiting the application of implicit domain-to-base-type casts. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Query returning tuples that does not satisfy the WHERE clause
Tom Lane <[EMAIL PROTECTED]> writes: > You seem to have worse problems than that, because as given the > insc_registra_grupo function never returns non-NULL Actually the function (and the database) is quite complex and was trimed just to test the problem. > I suspect pilot error --- perhaps looking at the wrong schema or > some such? I don't think so, search_path has its default value and also I'm able to reproduce it in a fresh cluster (tried 3 different machines to discard hardware problems). Regards, Manuel. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > >> AFAICT, most of the useful operations work on UChar, which is uint16: > >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > > 5324ef288165e2ac > > Oh, you're confusing UCS-2 with UTF-16, > Ah, you're right, I did misunderstand that. However, it's still > apparently the case that ICU works mostly with UTF16 and handles other > encodings only via conversion to UTF16. That's a pretty serious > mismatch with our needs --- we'll end up converting to UTF16 all the > time. We're certainly not going to change to using UTF16 as the actual > native string representation inside the backend, both because of the > space penalty and incompatibility with tools like bison. I think I've been involved in a discussion like this in the past. Was it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding means that UTF-8 applications are at a disadvantage when using the library. UTF-16 is considered more efficient to work with for everybody except ASCII users. :-) No opinion on the matter though. Changing PostgreSQL to UTF-16 would be an undertaking... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Query returning tuples that does not satisfy the WHERE clause
Manuel Sugawara writes: > Using the second plan the query is returning tuples where >a=F1o_semestre <> 20071 You seem to have worse problems than that, because as given the insc_registra_grupo function never returns non-NULL, and so the query ought not be returning any tuples at all. I suspect pilot error --- perhaps looking at the wrong schema or some such? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Domains and subtypes, a brief proposal
On Thu, Sep 07, 2006 at 07:12:17PM -0700, Josh Berkus wrote: > Elein, > > > I may have missed some stuff here. Obviously. For example how to divide > > and conquer the various aspects of the issues raised here. But this is a > > high, high level proposal at this time. > > I'm not quite clear on what in your proposal is different from current Domain > behavior. Or are you just looking to remove the limitations on where Domains > can be used? > > -- > Josh Berkus > PostgreSQL @ Sun > San Francisco > I'm looking to make domains proper types and eliminate the need for special domain checking in the case where it is not a check constraint check. I'm trying to influence a more logical implementation of domains that removes some code and gains us features. Also it should eliminate some (all?) of the current limitations with regards to using domains as subtypes. It will also pave the way for implementation of create type under type. --elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Query returning tuples that does not satisfy the WHERE clause
Hi all, I'm running PostgreSQL v 8.1.4 and found a query that returns tuples that does not satisfy the WHERE clause, the query is: select * into errores_20071 from ( select r.id, r.trayectoria_id, r.grupo_id, regacd.insc_registra_grupo(trayectoria_id, grupo_id, true, false, true) as error from regacd.registro r join regacd.grupo g on (g.id = r.grupo_id) where g.año_semestre = 20071 and g.tipo_id = 'a') x where error is not null; A self-contained database schema is here: schema-registro.sql Description: Binary data Unfortunately I cannot post the data set but I'm willing to give access to my machine to test the problem. While trying to create a self contained test case I found that the query returns the correct answer before analyzing: QUERY PLAN Nested Loop (cost=41.61..31193.44 rows=36 width=12) -> Index Scan using "AsignaturClaveGrupoÚnicaPorAñoSemestre2" on grupo g (cost=0.00..14.03 rows=3 width=4) Index Cond: (("año_semestre" = 20071) AND (tipo_id = 'a'::"char")) -> Bitmap Heap Scan on registro r (cost=41.61..10305.22 rows=7031 width=12) Recheck Cond: ("outer".id = r.grupo_id) Filter: (insc_registra_grupo(trayectoria_id, grupo_id, true, false, true) IS NOT NULL) -> Bitmap Index Scan on registro_grupo (cost=0.00..41.61 rows=7031 width=0) Index Cond: ("outer".id = r.grupo_id) (8 filas) but does not after I run analyze: QUERY PLAN Hash Join (cost=1166.75..44109.74 rows=34184 width=12) Hash Cond: ("outer".grupo_id = "inner".id) -> Seq Scan on registro r (cost=0.00..28538.85 rows=1397684 width=12) Filter: (insc_registra_grupo(trayectoria_id, grupo_id, true, false, true) IS NOT NULL) -> Hash (cost=1159.54..1159.54 rows=2883 width=4) -> Bitmap Heap Scan on grupo g (cost=31.30..1159.54 rows=2883 width=4) Recheck Cond: (("año_semestre" = 20071) AND (tipo_id = 'a'::"char")) -> Bitmap Index Scan on "AsignaturClaveGrupoÚnicaPorAñoSemestre2" (cost=0.00..31.30 rows=2883 width=0) Index Cond: (("año_semestre" = 20071) AND (tipo_id = 'a'::"char")) (9 filas) Using the second plan the query is returning tuples where año_semestre <> 20071 Any help will be appreciated. Best regards, Manuel. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout writes: >> AFAICT, most of the useful operations work on UChar, which is uint16: >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > 5324ef288165e2ac > Oh, you're confusing UCS-2 with UTF-16, Ah, you're right, I did misunderstand that. However, it's still apparently the case that ICU works mostly with UTF16 and handles other encodings only via conversion to UTF16. That's a pretty serious mismatch with our needs --- we'll end up converting to UTF16 all the time. We're certainly not going to change to using UTF16 as the actual native string representation inside the backend, both because of the space penalty and incompatibility with tools like bison. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: > > one more doubt.in the following example we dont need sort right.But in > this case also relations were scanned sequentially one by one. In this case it's because the result from one seqscan was materialized. > So is it the > case that any relation is accessed only once from database while executing > a given query? Not in general -- you'll see that behavior only in particular cases. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql shared buffers
one more doubt.in the following example we dont need sort right.But in this case also relations were scanned sequentially one by one.So is it the case that any relation is accessed only once from database while executing a given query? praveen=# explain select count(*) from a_9000_0,b_9000_0; QUERY PLAN -- Aggregate (cost=537566595.00..537566595.01 rows=1 width=0) -> Nested Loop (cost=7616.00..485726595.00 rows=2073600 width=0) -> Seq Scan on a_9000_0 (cost=0.00..6979.00 rows=144000 width=0) -> Materialize (cost=7616.00..9549.00 rows=144000 width=0) -> Seq Scan on b_9000_0 (cost=0.00..6979.00 rows=144000 width=0) (5 rows) Regards, Praveen On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 15:30:37 +0100 From: Heikki Linnakangas <[EMAIL PROTECTED]> To: Praveen Kumar N <[EMAIL PROTECTED]> Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. could you tell me how can we trace that? I mean which functions shall I checkout for that. The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't remember function names from the top of my head. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 12:19:19PM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: > >> what's more, the docs suggest that it doesn't support anything wider > >> than UTF16. > > > Well, that's not true, which part of the docs were you looking at? > > AFAICT, most of the useful operations work on UChar, which is uint16: > http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b305324ef288165e2ac Oh, you're confusing UCS-2 with UTF-16, UCS-2 is a subset of UTF-16 that only handles the basic plane. Just like no-one is surprised that UTF-8 handles more than 256 characters, it shouldn't surprise you that UTF-16 handles more than 65536. ICU hasn't used UCS-2 since 1996. It's in the FAQ: http://icu.sourceforge.net/userguide/icufaq.html Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout writes: > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: >> what's more, the docs suggest that it doesn't support anything wider >> than UTF16. > Well, that's not true, which part of the docs were you looking at? AFAICT, most of the useful operations work on UChar, which is uint16: http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b305324ef288165e2ac regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: > The reason this is a relevant consideration: we are talking about > changes that would remove existing functionality for people who don't > have that library. Huh? If you don't select ICU at compile time you get no difference from what we have now. I'm not sure I'm seeing your point. My COLLATE patches did allow both to coexist, but no-one appeared to like that idea either. > I suppose it might be possible to do > #ifdef HAVE_ICU > ... new code ... > #else > ... existing code ... > #endif > but given the differences in API I can't believe this would be readable > or maintainable. That's what the patch does. And the api differences are marginal. They even have C compatability functions to make it easier. > Another problem is that AFAICT, depending on ICU would force us to > standardize on Unicode as the *only* server internal encoding; Huh? You can use whatever encoding you like... Actual collations are determined on the basis of unicode properties, but I don't think that is what you're referring to. > what's more, the docs suggest that it doesn't support anything wider > than UTF16. Well, that's not true, which part of the docs were you looking at? Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
[HACKERS] BF failure - kookaburra - ecpg
One of the new ecpg tests for comments fails to compile the resulting .c file, because my vendor C compiler doesn't like '//' style comments, when running in C mode. Specifically its line 12 of comment.pgc: // we also understand this style It seems like ecpg should translate the comment from '//' to '/* */' style, for the output .c file. Thanks, -rocco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout writes: > I'm still missing the argument of why you can't just make a 16-byte > type. Around half the datatypes in postgresql are fixed-length and have > no header. I'm completely confused about why people are hung up about > bytea(16) not being fixed length when it's trivial to create a type > that is. Because by the time you have a CHAR(1), CHAR(2), CHAR(4), and CHAR(8) your head is already swimming trying to keep track of all the casts and cross-data-type comparators and you haven't even covered all the cases. If you define types just for the lengths up to 128 you would have 16,384 casts and 114,688 different cross-data-type comparisons just between them. Without them you wouldn't be able to have things like phone_number char(10) area_code char(3) and do things like: WHERE phone_number LIKE area_code||'%' And before you say so, sure this isn't the only way to do this and there are reasons why this may not be the best. But if you were shipping separate data types for char(3) and char(10) I think it would be a bug if the above didn't work. The problem is worse with numeric in that it would definitely be a bug if you couldn't use an index when comparing two numeric columns just because one had less precision than the other. There wouldn't be nearly as many types but even with just three such types you're already talking about hundreds of cross-data-type comparisons. Would others really consider shipping hundreds of new types to take care of this problem? I was looking for a more general solution. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout writes: > On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: >> Uhm, an ICU source tree is over 40 *megabytes*. > I don't understand this argument. No-one asked what size the LDAP > libraries were when we added support for them. No-one cares that > libssl/libcrypto is as large as glibc. The reason this is a relevant consideration: we are talking about changes that would remove existing functionality for people who don't have that library. People who don't have LDAP don't care that the PG sources have some LDAP functionality they're not getting, people who don't have SSL evidently don't care about that, etc. But there is existing, portable locale and multi-charset support in PG, and even though it's rather limited it's still useful. So you're telling people "to maintain the same functionality you have today, you will have to add this rather large library". That is only zero-cost from the perspective of someone who already has ICU installed; from everyone else, you should expect pushback. I suppose it might be possible to do #ifdef HAVE_ICU ... new code ... #else ... existing code ... #endif but given the differences in API I can't believe this would be readable or maintainable. Another problem is that AFAICT, depending on ICU would force us to standardize on Unicode as the *only* server internal encoding; what's more, the docs suggest that it doesn't support anything wider than UTF16. From the point of view of some of our far eastern users, both of those are serious steps backward. "Add large library, get *less* functionality" is an even harder sell. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. could you tell me how can we trace that? I mean which functions shall I checkout for that. The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't remember function names from the top of my head. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] postgresql shared buffers
On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 15:22:19 +0100 From: Heikki Linnakangas <[EMAIL PROTECTED]> To: Praveen Kumar N <[EMAIL PROTECTED]> Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: Following is the output of query i have executed. Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. could you tell me how can we trace that? I mean which functions shall I checkout for that. If you want to see the behavior you expected, I think you need to define indexes on a2 and b2, if you don't have them already, and coerce the planner to choose a nested loop join. I'd suggest using "SET enable_seqscan=false; SET enable_bitmapscan=false;" and see if that gets you a nested loop join. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: Following is the output of query i have executed. Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. If you want to see the behavior you expected, I think you need to define indexes on a2 and b2, if you don't have them already, and coerce the planner to choose a nested loop join. I'd suggest using "SET enable_seqscan=false; SET enable_bitmapscan=false;" and see if that gets you a nested loop join. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] postgresql shared buffers
Following is the output of query i have executed. praveen=# explain select count(*) from a_9000_0,b_9000_0 where a2=b2; QUERY PLAN Aggregate (cost=1924635.42..1924635.43 rows=1 width=0) -> Merge Join (cost=109515.42..1665435.42 rows=10368 width=0) Merge Cond: (a_9000_0.a2 = b_9000_0.b2) -> Sort (cost=54757.71..55117.71 rows=144000 width=260) Sort Key: a_9000_0.a2 -> Seq Scan on a_9000_0 (cost=0.00..6979.00 rows=144000 width=260) -> Sort (cost=54757.71..55117.71 rows=144000 width=260) Sort Key: b_9000_0.b2 -> Seq Scan on b_9000_0 (cost=0.00..6979.00 rows=144000 width=260) (9 rows) On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 14:57:57 +0100 From: Heikki Linnakangas <[EMAIL PROTECTED]> To: Praveen Kumar N <[EMAIL PROTECTED]> Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: Let me explain once more. I have two relations which are 10 times more than bufferpool size.I have observed the following things when joined that two relations(it using merge join to join both relations) 1.It first accessed system catalog tables 2.Relation 1 3.Relation 2 my doubt is one whole relation cant fit in the main memory.That too when we use merge join, it should keep some part of 1st relations and should scan second relation as bufferpool size is less compared to size of each relation.similarly for the remainin part of 1st relation.But it is not happening here.First whole Relation1 is scanned and then Relation 2 is scanned. Then how is it joining two relations using merge join? Am I missing something? Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it really does, otherwise we're just guessing. I traced scanning of relation by editing the functions ReadBuffer() and BufferAlloc(),StrategyGetBuffer(). That sounds valid. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql shared buffers
Heikki Linnakangas wrote: > Praveen Kumar N wrote: > >Let me explain once more. > > > >I have two relations which are 10 times more than bufferpool size.I > >have observed the following things when joined that two relations(it > >using merge join to join both relations) > > > >1.It first accessed system catalog tables > >2.Relation 1 > >3.Relation 2 > > > >my doubt is one whole relation cant fit in the main memory.That too > >when we use merge join, it should keep some part of 1st relations and > >should scan second relation as bufferpool size is less compared to > >size of each relation.similarly for the remainin part of 1st > >relation.But it is not happening here.First whole Relation1 is scanned > >and then Relation 2 is scanned. Then how is it joining two relations > >using merge join? Am I missing something? > > Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what > it really does, otherwise we're just guessing. Another option would be that the tuples are written to a sort tape, I think. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: Let me explain once more. I have two relations which are 10 times more than bufferpool size.I have observed the following things when joined that two relations(it using merge join to join both relations) 1.It first accessed system catalog tables 2.Relation 1 3.Relation 2 my doubt is one whole relation cant fit in the main memory.That too when we use merge join, it should keep some part of 1st relations and should scan second relation as bufferpool size is less compared to size of each relation.similarly for the remainin part of 1st relation.But it is not happening here.First whole Relation1 is scanned and then Relation 2 is scanned. Then how is it joining two relations using merge join? Am I missing something? Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it really does, otherwise we're just guessing. I traced scanning of relation by editing the functions ReadBuffer() and BufferAlloc(),StrategyGetBuffer(). That sounds valid. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote: > > But that won't help in the example you posted upthread, because char(N) > > is not fixed-length. > > It can be fixed-length, or at least, have an upper bound. If marked > up to contain only ascii characters, it doesn't, at least in theory, > and even if it is unicode, it's not going to need more than 4 bytes > per character. char(2) through char(16) only require 4 bits to > store the length header, leaving 4 bits for encoding information. > bytea(2) through bytea(16), at least in theory, should require none. If your talking about an upper-bound, then it's not fixed length anymore, and you need to expend bytes storing the length. ASCII bytes only take one byte in most encodings, include UTF8. Doodling this morning I remember why the simple approach didn't work. If you look at the varlena header, 2 bits are reserved. Say you take one bit to indicate "short header". Then lengths 0-31 bytes can be represented with a one byte header, yay! However, now you only have enough bits leftover to store 29 bits for the length, so we've just cut the maximum datum size from 1GB to 512MB. Is that a fair trade? Probably not, so you'd need a more sophisticated scheme. > For my own uses, I would like for bytea(16) to have no length header. > The length is constant. UUID or MD5SUM. Store the length at the head > of the table, or look up the information from the schema. I'm still missing the argument of why you can't just make a 16-byte type. Around half the datatypes in postgresql are fixed-length and have no header. I'm completely confused about why people are hung up about bytea(16) not being fixed length when it's trivial to create a type that is. > I see the complexity argument. Existing code is too heavy to change > completely. People talking about compromises such as allowing the > on disk layout to be different from the in memory layout. The biggest cost of having differing memory and disk layouts is that you have to "unpack" each disk page as it's read it. This means an automatic doubling of memory usage for the buffer cache. If you're RAM limited, that's the last thing you want. Currently, the executor will use the contents of the actual disk page when possible, saving a lot of copying. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] postgresql shared buffers
Let me explain once more. I have two relations which are 10 times more than bufferpool size.I have observed the following things when joined that two relations(it using merge join to join both relations) 1.It first accessed system catalog tables 2.Relation 1 3.Relation 2 my doubt is one whole relation cant fit in the main memory.That too when we use merge join, it should keep some part of 1st relations and should scan second relation as bufferpool size is less compared to size of each relation.similarly for the remainin part of 1st relation.But it is not happening here.First whole Relation1 is scanned and then Relation 2 is scanned. Then how is it joining two relations using merge join? Am I missing something? I traced scanning of relation by editing the functions ReadBuffer() and BufferAlloc(),StrategyGetBuffer(). I hope now it is clear. thanks in anticipation. bye On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 14:30:01 +0100 From: Heikki Linnakangas <[EMAIL PROTECTED]> To: Praveen Kumar N <[EMAIL PROTECTED]> Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: I have installed postgresql from sourcecode.I would like to know how pages are replaced in the bufferpool when we join two relations.I tried to trace it by editing files pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I am missing some information after observing extracted information abt buffer replacement.My input datasize is 10times more than main memory/RAM size.When I joined two relations,postgresql accessed both relations sequentially one by one and that too only once.Then how is it joining two relations by accessing only once? Is it storing that accessed relations some where other than main memory/bufferpool(Becos they cant fit into main memory). What kind of a join is it? If it's a merge join, using indexes, it would only have to visit each heap page once. So can anybdy tell me is there is ne thing I am missing? Is there any concept like postgresql cache similar to kernel cache otherthan sharedbuffers.If so how can we figure it out. No. All access to relations (except temporary relations) go through bufmgr and the shared memory buffer cache. Is there any way by which postgresql is accessing database relations through,other than rotines in bufmgr.c nd freelist.c(I mean any other routines like ReadBuffer,StrategyGet etc.) No. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 08:50:57AM +0200, Peter Eisentraut wrote: > Gregory Stark wrote: > > But it's largely true for OLTP applications too. The more compact the > > data the more tuples fit on a page and the greater the chance you > > have the page you need in cache. > But a linear amount of more RAM is still more affordable than a CPU that > is 100 times faster, which is about what some of the proposed schemes > would require. 100 times faster? I don't think it has been proven that a change in how data is stored would result in an increase in CPU usage. It's an assumption. It might be correct. It might not. I guess this is where patches speak louder than words... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N <[EMAIL PROTECTED]> writes: > hai... > > I have installed postgresql from sourcecode.I would like to know > how pages are replaced in the bufferpool when we join two relations.I tried to > trace it by editing files > pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I > am missing some information after observing extracted information abt buffer > replacement. Try explain select ... The output may be enlightening. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: I have installed postgresql from sourcecode.I would like to know how pages are replaced in the bufferpool when we join two relations.I tried to trace it by editing files pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I am missing some information after observing extracted information abt buffer replacement.My input datasize is 10times more than main memory/RAM size.When I joined two relations,postgresql accessed both relations sequentially one by one and that too only once.Then how is it joining two relations by accessing only once? Is it storing that accessed relations some where other than main memory/bufferpool(Becos they cant fit into main memory). What kind of a join is it? If it's a merge join, using indexes, it would only have to visit each heap page once. So can anybdy tell me is there is ne thing I am missing? Is there any concept like postgresql cache similar to kernel cache otherthan sharedbuffers.If so how can we figure it out. No. All access to relations (except temporary relations) go through bufmgr and the shared memory buffer cache. Is there any way by which postgresql is accessing database relations through,other than rotines in bufmgr.c nd freelist.c(I mean any other routines like ReadBuffer,StrategyGet etc.) No. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote: > Gregory Stark wrote: > > I think we have to find a way to remove the varlena length header > > entirely for fixed length data types since it's going to be the same > > for every single record in the table. > But that won't help in the example you posted upthread, because char(N) > is not fixed-length. It can be fixed-length, or at least, have an upper bound. If marked up to contain only ascii characters, it doesn't, at least in theory, and even if it is unicode, it's not going to need more than 4 bytes per character. char(2) through char(16) only require 4 bits to store the length header, leaving 4 bits for encoding information. bytea(2) through bytea(16), at least in theory, should require none. For my own uses, I would like for bytea(16) to have no length header. The length is constant. UUID or MD5SUM. Store the length at the head of the table, or look up the information from the schema. I see the complexity argument. Existing code is too heavy to change completely. People talking about compromises such as allowing the on disk layout to be different from the in memory layout. I wonder whether the change could be small enough to not significantly increase CPU, while still having significant effect. I find myself doubting the CPU bound numbers. If even 20% data is saved, this means 20% more RAM for caching, 20% less pages touched when scanning, and 20% less RAM read. When people say CPU-bound, are we sure they do not mean RAM speed bound? How do they tell the difference between the two? RAM lookups count as CPU on most performance counters I've ever used. RAM speed is also slower than CPU speed, allowing for calculations between accesses assuming that the loop allows for prefetching to be possible and accurate. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] postgresql shared buffers
hai... I have installed postgresql from sourcecode.I would like to know how pages are replaced in the bufferpool when we join two relations.I tried to trace it by editing files pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I am missing some information after observing extracted information abt buffer replacement.My input datasize is 10times more than main memory/RAM size.When I joined two relations,postgresql accessed both relations sequentially one by one and that too only once.Then how is it joining two relations by accessing only once? Is it storing that accessed relations some where other than main memory/bufferpool(Becos they cant fit into main memory). So can anybdy tell me is there is ne thing I am missing? Is there any concept like postgresql cache similar to kernel cache otherthan sharedbuffers.If so how can we figure it out. Is there any way by which postgresql is accessing database relations through,other than rotines in bufmgr.c nd freelist.c(I mean any other routines like ReadBuffer,StrategyGet etc.) thanks in anticipation. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 02:14:58PM +0200, Peter Eisentraut wrote: > So "mathematically", you are right, the collation is a property of the > operation, not of the operands. But semantically, the operands do > carry the information of what collation order they would like to be > compared under, and if two pieces of data with different choices meet, > you need an override. Sure, but SQL COLLATE handles all that just fine. At no point is the collation a property of the operands. At best is a property of the source of the operands but can be overridden at any point. SQL also covers the case where there is ambiguity, and the writer of the query has to clarify. Collation is hard precisly because it's not a property of the operands, which makes it very difficult to make postgresql do it. > Incidentally, if you buy into that, this would also neatly solve the > problem of how to arrange for column-specific case conversion rules, > which SQL does not address at all. SQL does say that UPPER and LOWER should be handled by Unicode rules, however the notes do mention that they should probably pay attention to the collation and character set, since the results are dependant on them. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Fixed length data types issue
Peter Eisentraut wrote: The real problem is that the established method dividing up the locale categories ignores both the technological and the linguistic reality. In reality, all properties like lc_collate, lc_ctype, and lc_numeric are dependent on the property "language of the text". I don't buy that. lc_collate, lc_ctype and lc_numeric are certainly related, but they're not a property of the "language of the text". For example, imagine an employee database for an international company. When a user wants to print out a sorted list of employees, the language of the text in the database (name of an employee) is irrelevant. A german user would like to see the names in different order than an English-speaking user. I've seen this in practice. Also, see: http://www.unicode.org/unicode/reports/tr10/#Common_Misperceptions for another example. In general, it doesn't make sense to sort a text by Spanish rules, downcase by Turkish rules, and embed numbers using English punctuation. Of course you can do all that, but it's generally not very useful and might give inconsistent results. (For extra credit: how do you do case-insensitive sorts with inconsistent lc_collate and lc_ctype settings?) Sure. Don't do that, that's just silly. But I don't see how that's relevant. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
Heikki Linnakangas wrote: > have a default set per-database, per-table or per-column, but it's > not a property of the actual value of a field. I think that the > phrase "collation of a string" doesn't make sense. The real problem is that the established method dividing up the locale categories ignores both the technological and the linguistic reality. In reality, all properties like lc_collate, lc_ctype, and lc_numeric are dependent on the property "language of the text". In general, it doesn't make sense to sort a text by Spanish rules, downcase by Turkish rules, and embed numbers using English punctuation. Of course you can do all that, but it's generally not very useful and might give inconsistent results. (For extra credit: how do you do case-insensitive sorts with inconsistent lc_collate and lc_ctype settings?) So "mathematically", you are right, the collation is a property of the operation, not of the operands. But semantically, the operands do carry the information of what collation order they would like to be compared under, and if two pieces of data with different choices meet, you need an override. Incidentally, if you buy into that, this would also neatly solve the problem of how to arrange for column-specific case conversion rules, which SQL does not address at all. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] large object regression tests
Hi, Jeremy, Jeremy Drake wrote: > I am considering, and I think that in order to get a real test of the > large objects, I would need to load data into a large object which would > be sufficient to be loaded into more than one block (large object blocks > were 1 or 2K IIRC) so that the block boundary case could be tested. Is > there any precedent on where to grab such a large chunk of data from? You could generate such data on the fly, as part of the test scripts. E. G. a blob of zero bytes, blob of 0xff bytes, a blob of pseudo random data... Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Fixed length data types issue
On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: > Uhm, an ICU source tree is over 40 *megabytes*. That's almost as much as the > rest of Postgres itself and that doesn't even include documentation. Even if > you exclude the data and regression tests you're still talking about depending > on the portability and correctness of over 10 megabytes of new code. I don't understand this argument. No-one asked what size the LDAP libraries were when we added support for them. No-one cares that libssl/libcrypto is as large as glibc. What size the libraries are that postgresql uses is somewhat irrelevent. It's not like we're forcing people to install them. > Neither is ICU available on most platforms. In any case we only need strcoll_l > as a performance optimization, the regular interface works, it's just slow. Can you point me to a common platform where postgresql runs and ICU doesn't? http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/readme.html?rev=HEAD#HowToBuildSupported The only one I can see in the buildfarm that isn't mentioned is Unixware. > Well equal is part of collation at least in the sense you mean. What it > doesn't help with is things like tolower or regexp matching. These are the > things that I would suggest you usually want to be doing on the client because > SQL's string manipulation facilities are so poor compared to most client > languages. If I specify a collation where case and accents are ignored, then GROUP BY should ignore them too, and regexps should honour that. Moving all this to the client doesn't seem like a good move at all. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 11:58:59AM +0100, Heikki Linnakangas wrote: > Martijn van Oosterhout wrote: > >I think that if SQL COLLATE gets in we'll get this almost for free. > >Collation and charset are both properties of strings. Once you've got a > >mechanism to know the collation of a string, you just attach the > >charset to the same place. The only difference is that changing charsets > >requires recoding, wheres changing collation does not. > > Not quite. Collation is a property of the operation that you're doing. > For example, if you're doing a sort, you might do it in different > collation depending on the user that's doing it, or it might even be > chosen by the user case-by-case. Of course, usually you have a default > set per-database, per-table or per-column, but it's not a property of > the actual value of a field. I think that the phrase "collation of a > string" doesn't make sense. Sorry, you're quite right. The collation is the property of an executor node, or at least that's how I thought of it while I was working on it. By that I mean that each source (say column) has a defined value for collation and charset (the SQL defines the rules for determining collation, don't know about charset). At each point in the query you can point at the charset and collation applying to that node. However, I think my point that charset and collations could be treated via the same mechanism is still valid. If I get time I might rework the COLLATE spec I wrote to include charset stuff. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] A note about buildfarm ecpg-check
On Fri, Sep 08, 2006 at 05:38:27AM -0400, Andrew Dunstan wrote: > We can, although I have tried to keep the order fairly logical. Are we > seeing so many ECPG failures that it's a major blocker? bustard has had > one ECPG failure in the last 5 days. We are currently only reporting > failures on OpenBSD and the mipsel box. The mipsel one is fixed too. So except for the OpenBSD strtod bug ecpg-check seems to run cleanly on all machines. However, I plan to do some more changes to the test suite. Nothing major, just trying to get rid of the complex tests that test so many features at the same time and divide them into several testcases. This might give us a red from time to time, but hopefully nothing major. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
Martijn van Oosterhout wrote: I think that if SQL COLLATE gets in we'll get this almost for free. Collation and charset are both properties of strings. Once you've got a mechanism to know the collation of a string, you just attach the charset to the same place. The only difference is that changing charsets requires recoding, wheres changing collation does not. Not quite. Collation is a property of the operation that you're doing. For example, if you're doing a sort, you might do it in different collation depending on the user that's doing it, or it might even be chosen by the user case-by-case. Of course, usually you have a default set per-database, per-table or per-column, but it's not a property of the actual value of a field. I think that the phrase "collation of a string" doesn't make sense. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
On Fri, Sep 08, 2006 at 05:54:01AM -0400, Andrew Dunstan wrote: > >The encoding is set per-database. Even if you need UTF-8 to encode > >user-supplied strings, there can still be many small ASCII fields in > >the database. Country code, currency code etc. > > ISTM we should revisit this when we get per-column encoding. I think that if SQL COLLATE gets in we'll get this almost for free. Collation and charset are both properties of strings. Once you've got a mechanism to know the collation of a string, you just attach the charset to the same place. The only difference is that changing charsets requires recoding, wheres changing collation does not. I think it'd just become a special case of the Relabel node. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Fixed length data types issue
Heikki Linnakangas wrote: Gregory Stark wrote: But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code etc. ISTM we should revisit this when we get per-column encoding. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixed length data types issue
Gregory Stark wrote: But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code etc. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] A note about buildfarm ecpg-check
Tom Lane wrote: I see that the buildfarm script seems to be running ecpg-check pretty early in the sequence. Considering that the ecpg tests are still far from stable, this seems to be taking away the opportunity to learn as much as we can from a buildfarm run. Could we run the ecpg tests last? We can, although I have tried to keep the order fairly logical. Are we seeing so many ECPG failures that it's a major blocker? bustard has had one ECPG failure in the last 5 days. We are currently only reporting failures on OpenBSD and the mipsel box. Anyway, I have made the change in CVS and buildfarm members can upgrade to CVS version 1.69 of run_build.pl. I guess that would mainly matter for the owners of the currently failing 3 boxes. If it proves to be necessary beyond a small time frame I will cut a new release. An even better idea would be to teach the script about test dependencies so that it could run test steps even when an earlier-but-unrelated test had failed. But I'm sure that's a lot more work. Yes. It might be possible with the modularisation work I'm contemplating, but that's some way off, and I'm not going to hack it in right now. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
Gregory Stark wrote: > > But that won't help in the example you posted upthread, because > > char(N) is not fixed-length. > > Sure it is because any sane database--certainly any sane database > using char(N)--is in C locale anyways. This matter is completely independent of the choice of locale and therefore any unilateral redefinition of sanity that you might come up with. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fixed length data types issue
Gregory Stark <[EMAIL PROTECTED]> writes: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > > Gregory Stark wrote: > > > > But that won't help in the example you posted upthread, because > > > > char(N) is not fixed-length. > > > > > > Sure it is because any sane database--certainly any sane database > > > using char(N)--is in C locale anyways. > > > > This matter is completely independent of the choice of locale and > > therefore any unilateral redefinition of sanity that you might come up > > with. > > Except it isn't. If you're dealing with fixed length ascii codes from existing > databases you interoperate with then you will have problems if you initialize > your database in a non-C locale. Interpreting those codes in your locale will > be do incorrect things like treat them as case insensitive or ignore spaces in > collation, etc. Oh, I think I misread your comment. You're saying the choice of encoding is independent of the choice of locale. Sure, if you're using UTF8 then how efficiently Postgres stores fixed length data types isn't terribly relevant to you. Just as it isn't relevant if you're storing other variable length data types. But why would you use UTF8 to encode fixed length ascii strings? -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fixed length data types issue
On 2006-09-08, Gregory Stark <[EMAIL PROTECTED]> wrote: >> But that won't help in the example you posted upthread, because char(N) >> is not fixed-length. > > Sure it is because any sane database--certainly any sane database using > char(N)--is in C locale anyways. You're confusing locale and charset. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixed length data types issue
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: > > > But that won't help in the example you posted upthread, because > > > char(N) is not fixed-length. > > > > Sure it is because any sane database--certainly any sane database > > using char(N)--is in C locale anyways. > > This matter is completely independent of the choice of locale and > therefore any unilateral redefinition of sanity that you might come up > with. Except it isn't. If you're dealing with fixed length ascii codes from existing databases you interoperate with then you will have problems if you initialize your database in a non-C locale. Interpreting those codes in your locale will be do incorrect things like treat them as case insensitive or ignore spaces in collation, etc. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: > > I think we have to find a way to remove the varlena length header > > entirely for fixed length data types since it's going to be the same > > for every single record in the table. > > But that won't help in the example you posted upthread, because char(N) > is not fixed-length. Sure it is because any sane database--certainly any sane database using char(N)--is in C locale anyways. In any case if you disagree about that (and you're wrong) then substitute some other data type. Defining such a data type may be part of the problem that has to be solved here. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match