Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
Richard Huxton wrote: Dann Corbit wrote: I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? 100% YES! If two values are the same, then any function on those two values should return the same result. Otherwise what does equals mean? At the very least length() is broken by your argument. I agree completely. I would much rather be precise than intuitive. And, I have done applications where the padding length was important, especially when working with remote batch processing in the credit industries and the like. Writing queries to create and process fixed-width batch files is much easier if you can rely on these kinds of behaviors. Here it's CHAR that's broken IMHO - spawn of some punch-card spawned data processing rules of the 70s. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
Dann Corbit wrote: I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? 100% YES! If two values are the same, then any function on those two values should return the same result. Otherwise what does equals mean? At the very least length() is broken by your argument. Here it's CHAR that's broken IMHO - spawn of some punch-card spawned data processing rules of the 70s. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
Dann Corbit wrote: Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? When the compared datatypes are VARCHAR: YES When the compared datatypes are fixed length like CHAR: NO Suppose, for instance, that you have an employee table in your database. You have another table called benefits. Perhaps it is even in another database. Conceivably even in a database other than PostgreSQL. Anyway, you want to match information between the two systems so you join on some redundant columns like a.First_Name = b.fname AND a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city AND a.Street_name = b.street If the columns are not the same length (or one is fixed length and the other variable), then 'Danniel' won't match 'Danniel' unless you trim it. If you trim it, then the indexes go out the window. If the indexes go out the window, then we table scan. If you want to compare datatypes that are different, and you don't have an index that accomodates, that's the price you pay for comparing different data types. I don't like that behavior. Perhaps others who know more than me can say why not blank padding comparisons is a good idea. Clearly, one could argue that having redundant data is bad and that every attribute in a database intended to match should be exactly the same type. But there are lots of database systems badly designed. And of well designed systems, it is not uncommon to have more than one database in your organization, and a need to perform federated joins as well because of it. -Original Message- From: Tino Wildenhain [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 1:05 PM To: Marc G. Fournier Cc: Dann Corbit; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED]; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier: I'm CC'ng this over to -hackers ... Tom? Comments? On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-general- [EMAIL PROTECTED] On Behalf Of Marc G. Fournier Sent: Wednesday, October 19, 2005 11:41 AM To: [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote: I was referring to trailing blanks, but did not explicitly say it, though showed it in the examples. I am pretty sure that the SQL standard says that trailing whitespace is insignificant in string comparison. Then we are broken too :) # select 'a ' = 'a '; ?column? -- f (1 row) experiment=# SELECT 'a '::char = 'a '::char; ?column? -- t ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Terry Fielder [EMAIL PROTECTED] Associate Director Software Development and Deployment Great Gulf Homes / Ashton Woods Homes Fax: (416) 441-9085 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? Suppose, for instance, that you have an employee table in your database. You have another table called benefits. Perhaps it is even in another database. Conceivably even in a database other than PostgreSQL. Anyway, you want to match information between the two systems so you join on some redundant columns like a.First_Name = b.fname AND a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city AND a.Street_name = b.street If the columns are not the same length (or one is fixed length and the other variable), then 'Danniel' won't match 'Danniel' unless you trim it. If you trim it, then the indexes go out the window. If the indexes go out the window, then we table scan. I don't like that behavior. Perhaps others who know more than me can say why not blank padding comparisons is a good idea. Clearly, one could argue that having redundant data is bad and that every attribute in a database intended to match should be exactly the same type. But there are lots of database systems badly designed. And of well designed systems, it is not uncommon to have more than one database in your organization, and a need to perform federated joins as well because of it. -Original Message- From: Tino Wildenhain [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 1:05 PM To: Marc G. Fournier Cc: Dann Corbit; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED]; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier: I'm CC'ng this over to -hackers ... Tom? Comments? On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-general- [EMAIL PROTECTED] On Behalf Of Marc G. Fournier Sent: Wednesday, October 19, 2005 11:41 AM To: [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote: I was referring to trailing blanks, but did not explicitly say it, though showed it in the examples. I am pretty sure that the SQL standard says that trailing whitespace is insignificant in string comparison. Then we are broken too :) # select 'a ' = 'a '; ?column? -- f (1 row) experiment=# SELECT 'a '::char = 'a '::char; ?column? -- t ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
-Original Message- From: Terry Fielder [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:05 PM To: Dann Corbit Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Dann Corbit wrote: Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? And even if people think it is a good idea, it seems to be at odds with the way that I read the standard (though I am easily confused by simple enough language at times). It could well be that I am jaded from years of doing it the wrong way (I expect two character strings with all leading non-blanks in agreement to compare equal). Perhaps this is old hat to the long-timers around here and there is a good explanation as to why varchar should have non-blank padding when comparisons are performed. Can someone point me to documentation that explains it? [snip] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote: When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? Well, looking from the point of view of using indexes, indexes can only really match on things that are equal. Which means the system is going to have to trim them anyway. I'm of the opinion that strings are strings and spaces are no different from other characters. That bit of the standard quoted earlier, if you read the PAD character that is different from any other character as being the NUL character, then 'aspacenul' is clearly different from 'aspacespace'. This whacky space behaviour is something I associate with the char(N) and is the main reason I never use it. Perhaps this is old hat to the long-timers around here and there is a good explanation as to why varchar should have non-blank padding when comparisons are performed. Can someone point me to documentation that explains it? The way I understood it: char(N) is blank padding varchar(N) is not If you make varchar(n) do blank padding, then what's the difference between the two types? You may as well get rid of one... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpl2oFftSN1C.pgp Description: PGP signature
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
-Original Message- From: Stephan Szabo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:34 PM To: Dann Corbit Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql- [EMAIL PROTECTED] Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) On Wed, 19 Oct 2005, Dann Corbit wrote: -Original Message- From: Terry Fielder [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:05 PM To: Dann Corbit Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Dann Corbit wrote: Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? It buys you the ability to store things where trailing spaces are signficant (for example passwords) within the existing limitations of not having a full set of the collation behavior. I suppose that there may be some value in having 'J^% ' be different from 'J^% '. I would expect to insert a password with trailing blanks to be the same as inserting a password with no trailing blanks. I think that whatever is done ought to be whatever the standard says. If I misinterpret the standard and PostgreSQL is doing it right, then that is fine. It is just that PostgreSQL is very counter-intuitive compared to other database systems that I have used in this one particular area. When I read the standard, it looked to me like PostgreSQL was not performing correctly. It is not unlikely that I read it wrong. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
-Original Message- From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:46 PM To: Dann Corbit Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql- [EMAIL PROTECTED] Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote: When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? Well, looking from the point of view of using indexes, indexes can only really match on things that are equal. Which means the system is going to have to trim them anyway. I'm of the opinion that strings are strings and spaces are no different from other characters. That bit of the standard quoted earlier, if you read the PAD character that is different from any other character as being the NUL character, then 'aspacenul' is clearly different from 'aspacespace'. This whacky space behaviour is something I associate with the char(N) and is the main reason I never use it. Perhaps this is old hat to the long-timers around here and there is a good explanation as to why varchar should have non-blank padding when comparisons are performed. Can someone point me to documentation that explains it? The way I understood it: char(N) is blank padding varchar(N) is not If you make varchar(n) do blank padding, then what's the difference between the two types? Storage. The blank padding is only for comparison purposes. You may as well get rid of one... All the other database systems seem to handle it in the way that I expect. Which is not to say that it is the right way or that it agrees with the standard. But it is how it appears to me, so far. ---(end of broadcast)--- TIP 6: explain analyze is your friend