Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-21 Thread Rick Morris

Richard Huxton wrote:

Dann Corbit wrote:



I can see plenty of harm and absolutely no return.  We are talking about
blank padding before comparison.  Do you really want 'Danniel '
considered distinct from 'Danniel  ' in a comparison?  In real life,
what does that buy you?



100% YES!

If two values are the same, then any function on those two values should 
return the same result. Otherwise what does equals mean? At the very 
least length() is broken by your argument.


I agree completely. I would much rather be precise than intuitive. And, 
I have done applications where the padding length was important, 
especially when working with remote batch processing in the credit 
industries and the like. Writing queries to create and process 
fixed-width batch files is much easier if you can rely on these kinds of 
behaviors.




Here it's CHAR that's broken IMHO - spawn of some punch-card spawned 
data processing rules of the 70s.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-20 Thread Richard Huxton

Dann Corbit wrote:


I can see plenty of harm and absolutely no return.  We are talking about
blank padding before comparison.  Do you really want 'Danniel '
considered distinct from 'Danniel  ' in a comparison?  In real life,
what does that buy you?


100% YES!

If two values are the same, then any function on those two values should 
return the same result. Otherwise what does equals mean? At the very 
least length() is broken by your argument.


Here it's CHAR that's broken IMHO - spawn of some punch-card spawned 
data processing rules of the 70s.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-20 Thread Terry Fielder



Dann Corbit wrote:

Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:

connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as
varchar(30));
 ?column?
--
(0 rows)

I see how you can interpret the SQL Standard to make the above response
a correct one.  But is it the response that you would like?


When the compared datatypes are VARCHAR: YES
When the compared datatypes are fixed length like CHAR: NO



Suppose, for instance, that you have an employee table in your database.
You have another table called benefits.  Perhaps it is even in another
database.  Conceivably even in a database other than PostgreSQL.

Anyway, you want to match information between the two systems so you
join on some redundant columns like a.First_Name = b.fname AND
a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city
AND a.Street_name = b.street

If the columns are not the same length (or one is fixed length and the
other variable), then 'Danniel' won't match 'Danniel' unless you trim
it.  If you trim it, then the indexes go out the window.  If the indexes
go out the window, then we table scan.


If you want to compare datatypes that are different, and you don't have 
an index that accomodates, that's the price you pay for comparing 
different data types.




I don't like that behavior.  Perhaps others who know more than me can
say why not blank padding comparisons is a good idea.

Clearly, one could argue that having redundant data is bad and that
every attribute in a database intended to match should be exactly the
same type.  But there are lots of database systems badly designed.  And
of well designed systems, it is not uncommon to have more than one
database in your organization, and a need to perform federated joins as
well because of it.



-Original Message-
From: Tino Wildenhain [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 19, 2005 1:05 PM
To: Marc G. Fournier
Cc: Dann Corbit; [EMAIL PROTECTED]; pgsql-
[EMAIL PROTECTED]; pgsql-general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
buysInnobase)

Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:


I'm CC'ng this over to -hackers ... Tom?  Comments?

On Wed, 19 Oct 2005, Dann Corbit wrote:



Yes, clearly that is the wrong result according to the SQL


standard.


Here is a SQL*Server query:
select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '

It returns (correctly): 1



-Original Message-
From: [EMAIL PROTECTED] [mailto:pgsql-general-
[EMAIL PROTECTED] On Behalf Of Marc G. Fournier
Sent: Wednesday, October 19, 2005 11:41 AM
To: [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase

On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote:



I was referring to trailing blanks, but did not explicitly say


it,


though showed it in the examples.  I am pretty sure that the SQL
standard says that trailing whitespace is insignificant in


string


comparison.


Then we are broken too :)

# select 'a ' = 'a  ';
 ?column?
--
 f
(1 row)



experiment=# SELECT 'a '::char = 'a  '::char;
?column?
--
t





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



--
Terry Fielder
[EMAIL PROTECTED]
Associate Director Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
Fax: (416) 441-9085

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:

connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as
varchar(30));
 ?column?
--
(0 rows)

I see how you can interpret the SQL Standard to make the above response
a correct one.  But is it the response that you would like?

Suppose, for instance, that you have an employee table in your database.
You have another table called benefits.  Perhaps it is even in another
database.  Conceivably even in a database other than PostgreSQL.

Anyway, you want to match information between the two systems so you
join on some redundant columns like a.First_Name = b.fname AND
a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city
AND a.Street_name = b.street

If the columns are not the same length (or one is fixed length and the
other variable), then 'Danniel' won't match 'Danniel' unless you trim
it.  If you trim it, then the indexes go out the window.  If the indexes
go out the window, then we table scan.

I don't like that behavior.  Perhaps others who know more than me can
say why not blank padding comparisons is a good idea.

Clearly, one could argue that having redundant data is bad and that
every attribute in a database intended to match should be exactly the
same type.  But there are lots of database systems badly designed.  And
of well designed systems, it is not uncommon to have more than one
database in your organization, and a need to perform federated joins as
well because of it.

 -Original Message-
 From: Tino Wildenhain [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 19, 2005 1:05 PM
 To: Marc G. Fournier
 Cc: Dann Corbit; [EMAIL PROTECTED]; pgsql-
 [EMAIL PROTECTED]; pgsql-general@postgresql.org
 Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
 buysInnobase)
 
 Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
  I'm CC'ng this over to -hackers ... Tom?  Comments?
 
  On Wed, 19 Oct 2005, Dann Corbit wrote:
 
   Yes, clearly that is the wrong result according to the SQL
standard.
  
   Here is a SQL*Server query:
   select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '
  
   It returns (correctly): 1
  
   -Original Message-
   From: [EMAIL PROTECTED] [mailto:pgsql-general-
   [EMAIL PROTECTED] On Behalf Of Marc G. Fournier
   Sent: Wednesday, October 19, 2005 11:41 AM
   To: [EMAIL PROTECTED]
   Cc: pgsql-general@postgresql.org
   Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase
  
   On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote:
  
   I was referring to trailing blanks, but did not explicitly say
it,
   though showed it in the examples.  I am pretty sure that the SQL
   standard says that trailing whitespace is insignificant in
string
   comparison.
  
   Then we are broken too :)
  
   # select 'a ' = 'a  ';
 ?column?
   --
 f
   (1 row)
 
 
 experiment=# SELECT 'a '::char = 'a  '::char;
  ?column?
 --
  t
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit
 -Original Message-
 From: Terry Fielder [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 19, 2005 2:05 PM
 To: Dann Corbit
 Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED];
 pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
 Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
 buysInnobase)
 
 
 
 Dann Corbit wrote:
  Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
 
  connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a '
as
  varchar(30));
   ?column?
  --
  (0 rows)
 
  I see how you can interpret the SQL Standard to make the above
response
  a correct one.  But is it the response that you would like?
 
 When the compared datatypes are VARCHAR: YES

What is the value of doing that?

I can see plenty of harm and absolutely no return.  We are talking about
blank padding before comparison.  Do you really want 'Danniel '
considered distinct from 'Danniel  ' in a comparison?  In real life,
what does that buy you?

And even if people think it is a good idea, it seems to be at odds with
the way that I read the standard (though I am easily confused by simple
enough language at times).

It could well be that I am jaded from years of doing it the wrong way (I
expect two character strings with all leading non-blanks in agreement to
compare equal).

Perhaps this is old hat to the long-timers around here and there is a
good explanation as to why varchar should have non-blank padding when
comparisons are performed.  Can someone point me to documentation that
explains it?

[snip]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Martijn van Oosterhout
On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote:
  When the compared datatypes are VARCHAR: YES
 
 What is the value of doing that?
 
 I can see plenty of harm and absolutely no return.  We are talking about
 blank padding before comparison.  Do you really want 'Danniel '
 considered distinct from 'Danniel  ' in a comparison?  In real life,
 what does that buy you?

Well, looking from the point of view of using indexes, indexes can only
really match on things that are equal. Which means the system is going
to have to trim them anyway. I'm of the opinion that strings are
strings and spaces are no different from other characters.

That bit of the standard quoted earlier, if you read the PAD character
that is different from any other character as being the NUL character,
then 'aspacenul' is clearly different from 'aspacespace'. This
whacky space behaviour is something I associate with the char(N) and is
the main reason I never use it.

 Perhaps this is old hat to the long-timers around here and there is a
 good explanation as to why varchar should have non-blank padding when
 comparisons are performed.  Can someone point me to documentation that
 explains it?

The way I understood it:

char(N) is blank padding
varchar(N) is not

If you make varchar(n) do blank padding, then what's the difference
between the two types? You may as well get rid of one...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpl2oFftSN1C.pgp
Description: PGP signature


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit
 -Original Message-
 From: Stephan Szabo [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 19, 2005 2:34 PM
 To: Dann Corbit
 Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
 [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy]
[GENERAL]
 Oracle buysInnobase)
 
 On Wed, 19 Oct 2005, Dann Corbit wrote:
 
   -Original Message-
   From: Terry Fielder [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, October 19, 2005 2:05 PM
   To: Dann Corbit
   Cc: Tino Wildenhain; Marc G. Fournier;
[EMAIL PROTECTED];
   pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
   Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL]
Oracle
   buysInnobase)
  
  
  
   Dann Corbit wrote:
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
   
connxdatasync=# select 1 where cast('a' as varchar(30)) =
cast('a '
  as
varchar(30));
 ?column?
--
(0 rows)
   
I see how you can interpret the SQL Standard to make the above
  response
a correct one.  But is it the response that you would like?
  
   When the compared datatypes are VARCHAR: YES
 
  What is the value of doing that?
 
  I can see plenty of harm and absolutely no return.  We are talking
about
  blank padding before comparison.  Do you really want 'Danniel '
  considered distinct from 'Danniel  ' in a comparison?  In real life,
  what does that buy you?
 
 It buys you the ability to store things where trailing spaces are
 signficant (for example passwords) within the existing limitations of
not
 having a full set of the collation behavior.

I suppose that there may be some value in having 'J^% ' be different
from 'J^%   '.

I would expect to insert a password with trailing blanks to be the same
as inserting a password with no trailing blanks.

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit
 -Original Message-
 From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 19, 2005 2:46 PM
 To: Dann Corbit
 Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
 [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
 buysInnobase)
 
 On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote:
   When the compared datatypes are VARCHAR: YES
 
  What is the value of doing that?
 
  I can see plenty of harm and absolutely no return.  We are talking
about
  blank padding before comparison.  Do you really want 'Danniel '
  considered distinct from 'Danniel  ' in a comparison?  In real life,
  what does that buy you?
 
 Well, looking from the point of view of using indexes, indexes can
only
 really match on things that are equal. Which means the system is going
 to have to trim them anyway. I'm of the opinion that strings are
 strings and spaces are no different from other characters.
 
 That bit of the standard quoted earlier, if you read the PAD character
 that is different from any other character as being the NUL character,
 then 'aspacenul' is clearly different from 'aspacespace'. This
 whacky space behaviour is something I associate with the char(N) and
is
 the main reason I never use it.
 
  Perhaps this is old hat to the long-timers around here and there is
a
  good explanation as to why varchar should have non-blank padding
when
  comparisons are performed.  Can someone point me to documentation
that
  explains it?
 
 The way I understood it:
 
 char(N) is blank padding
 varchar(N) is not
 
 If you make varchar(n) do blank padding, then what's the difference
 between the two types? 

Storage.  The blank padding is only for comparison purposes.

You may as well get rid of one...

All the other database systems seem to handle it in the way that I
expect.
Which is not to say that it is the right way or that it agrees with the
standard.
But it is how it appears to me, so far.

---(end of broadcast)---
TIP 6: explain analyze is your friend