Re: [HACKERS] Like vs '=' bug with indexing

2001-02-04 Thread Hannu Krosing

Tom Lane wrote:
 
 m w [EMAIL PROTECTED] writes:
  I think this exposes a bug in postgres where either
  index or table scans (I'm not sure which just yet)
  treat a zero differently than a non-zero in a varchar.
 
 Embedded zeroes aren't supported in char/varchar/text fields,
 and cannot be supported in a portable fashion, since these datatypes
 rely on functions like strcoll() that don't allow embedded nulls in
 strings.

Is there no simple (i.e. cheap) way to disallow \0 alltogether for 
these types then ?

perhaps just strip them out in textin() (or is it text_in()) ?

 It wouldn't surprise me too much if there are inconsistent
 behaviors between indexscans and seqscans for such invalid data.

should'nt they both use the _same_ strcoll() and friends ?

 It doesn't seem real practical for us to examine the output of every
 C-coded function to make sure it produces a valid value of the datatype.
 Illegal returned values are a fault of the function, and ensuing
 misbehaviors are still its fault ...

Should we not examine "the _possible_ outputs of every C-coded function 
to make sure it produces a valid value of the datatype" ;)

For me producing an invalid data for a datatype seems very much like 
a bug and it _should_ be reported.

-
Hannu



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-04 Thread m w


--- Hannu Krosing [EMAIL PROTECTED] wrote:

 Should we not examine "the _possible_ outputs of
 every C-coded function 
 to make sure it produces a valid value of the
 datatype" ;)
 
 For me producing an invalid data for a datatype
 seems very much like 
 a bug and it _should_ be reported.

No, I think Tom is right, there should be no
validation on C functions incorporated into Postgres
by users. Who wants that overhead in a production
system?

However, I think when the same SQL query produces
different results when you add an index, speaks of an
inconsistency in the system, which could be the source
of other problems.

I have seen a couple posts where results from an index
scan are not the same as the results from a table
scan, granted they were language issues, but still, my
gut tells me if I set the length of a variable to x,
and a trailing zero is included, the system should
either fail consistently or work consistently. I don't
care which, it should just be consistent.

Inconsistent behavior indicates that a different
matching algorithm is used if one uses an index
instead of a table scan. That scares me.

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-04 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes:
 Is there no simple (i.e. cheap) way to disallow \0 alltogether for 
 these types then ?
 perhaps just strip them out in textin() (or is it text_in()) ?

They *are* stripped out in textin(), by virtue of the fact that
textin expects to see a null-terminated input string.

 It wouldn't surprise me too much if there are inconsistent
 behaviors between indexscans and seqscans for such invalid data.

 should'nt they both use the _same_ strcoll() and friends ?

Irrelevant; the issue is that the various comparison operators may
produce inconsistent results given invalid input.  For instance
texteq() short-circuits to a FALSE result if the lengths of the
inputs are different, which means that  'ab\0' = 'ab'  will produce
false, even though a strcoll-based comparison will claim they are
equal.  I don't think that means that texteq() is wrong to check the
lengths first.

 Should we not examine "the _possible_ outputs of every C-coded function 
 to make sure it produces a valid value of the datatype" ;)

Go for it.

Possibly chr() should reject chr(0) ...

regards, tom lane



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-04 Thread Tom Lane

m w [EMAIL PROTECTED] writes:
 Inconsistent behavior indicates that a different
 matching algorithm is used if one uses an index
 instead of a table scan. That scares me.

A seq scan and an index scan are inherently different algorithms,
so I don't see exactly how you think we can avoid this risk.

In particular, if you are dealing with a btree index and a "WHERE
column = constant" query, then a seq scan is only going to be concerned
with the behavior of the '=' operator --- does it return TRUE or not
for any particular row?  But an index search is inherently going to make
ordered comparisons (, =, ).  So there is always a potential for
inconsistent behavior if the ordering operators produce results that are
inconsistent with simple '='.  We cannot design that away --- all we can
do is fix such bugs when one is discovered in a particular datatype.

regards, tom lane



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-04 Thread Hannu Krosing

Tom Lane wrote:
 
 Hannu Krosing [EMAIL PROTECTED] writes:
  Is there no simple (i.e. cheap) way to disallow \0 alltogether for
  these types then ?
  perhaps just strip them out in textin() (or is it text_in()) ?
 
 They *are* stripped out in textin(), by virtue of the fact that
 textin expects to see a null-terminated input string.

Ok, I was mistaken to think that pg_trigger.tgargs contained real \0's 
and not fakes and I was able to get similar output from other char 
types by using \\000 - You never can tell how many  are required to 
input a single \ to next level.

Also I remember being told that bytea _can_ hold embedded \0, no ?
 
  Should we not examine "the _possible_ outputs of every C-coded function
  to make sure it produces a valid value of the datatype" ;)
 
 Go for it.
 
 Possibly chr() should reject chr(0) ...

there is no function chr() at least in 7.0.2. 

and char is not usable from psql (gives out strange errors);

hannu=# select char(0);
ERROR:  length for type 'bpchar' must be at least 1
hannu=# select char(32);
ERROR:  parser: parse error at or near ";"
hannu=# select char('a');
ERROR:  parser: parse error at or near "'"

could it be possible somehow distinguish between user callable (safe) 
functions and "internal" ones ?


Hannu



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-04 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes:
 Also I remember being told that bytea _can_ hold embedded \0, no ?

Yes.  What has that got to do with text et al?

 Possibly chr() should reject chr(0) ...

 there is no function chr() at least in 7.0.2. 

I think it used to be called ichar(), but then someone pointed out that
Oracle calls it chr().

 and char is not usable from psql (gives out strange errors);

char(n) is a type name, not a function call.

regards, tom lane



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-03 Thread Tom Lane

m w [EMAIL PROTECTED] writes:
 Here is the problem: Depending on whether there is an
 index or not, 'like' behaves differently.

Please provide a complete, self-contained example with which
we can reproduce the problem.

regards, tom lane



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-03 Thread m w


--- Tom Lane [EMAIL PROTECTED] wrote:
 m w [EMAIL PROTECTED] writes:
  Here is the problem: Depending on whether there is
 an
  index or not, 'like' behaves differently.
 
 Please provide a complete, self-contained example
 with which
 we can reproduce the problem.

I am trying to create a test function and some data
that reproduces the problem easily, but I think I know
what it is.

It is a two bug issue. I had a bug in my code in that
I added the "\0" to the end of my strings when I
converted from a C string to a postgres "text" object.


I think this exposes a bug in postgres where either
index or table scans (I'm not sure which just yet)
treat a zero differently than a non-zero in a varchar.

It looks as if some section of code is using the zero
to terminate a string and another section of code is
not. So at some point data which should be the same
differes either on length of data, or the trailing
zero compared to an uninitialized byte.

Removing the terminating zero from the postgres string
fixes the problem, but, if I understand postgres well
enough, this should not make a difference, and should
be reported as a bug anyway.


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/



Re: [HACKERS] Like vs '=' bug with indexing

2001-02-03 Thread Tom Lane

m w [EMAIL PROTECTED] writes:
 I think this exposes a bug in postgres where either
 index or table scans (I'm not sure which just yet)
 treat a zero differently than a non-zero in a varchar.

Embedded zeroes aren't supported in char/varchar/text fields,
and cannot be supported in a portable fashion, since these datatypes
rely on functions like strcoll() that don't allow embedded nulls in
strings.  It wouldn't surprise me too much if there are inconsistent
behaviors between indexscans and seqscans for such invalid data.

It doesn't seem real practical for us to examine the output of every
C-coded function to make sure it produces a valid value of the datatype.
Illegal returned values are a fault of the function, and ensuing
misbehaviors are still its fault ...

regards, tom lane