Re: [HACKERS] Like vs '=' bug with indexing
Tom Lane wrote: m w [EMAIL PROTECTED] writes: I think this exposes a bug in postgres where either index or table scans (I'm not sure which just yet) treat a zero differently than a non-zero in a varchar. Embedded zeroes aren't supported in char/varchar/text fields, and cannot be supported in a portable fashion, since these datatypes rely on functions like strcoll() that don't allow embedded nulls in strings. Is there no simple (i.e. cheap) way to disallow \0 alltogether for these types then ? perhaps just strip them out in textin() (or is it text_in()) ? It wouldn't surprise me too much if there are inconsistent behaviors between indexscans and seqscans for such invalid data. should'nt they both use the _same_ strcoll() and friends ? It doesn't seem real practical for us to examine the output of every C-coded function to make sure it produces a valid value of the datatype. Illegal returned values are a fault of the function, and ensuing misbehaviors are still its fault ... Should we not examine "the _possible_ outputs of every C-coded function to make sure it produces a valid value of the datatype" ;) For me producing an invalid data for a datatype seems very much like a bug and it _should_ be reported. - Hannu
Re: [HACKERS] Like vs '=' bug with indexing
--- Hannu Krosing [EMAIL PROTECTED] wrote: Should we not examine "the _possible_ outputs of every C-coded function to make sure it produces a valid value of the datatype" ;) For me producing an invalid data for a datatype seems very much like a bug and it _should_ be reported. No, I think Tom is right, there should be no validation on C functions incorporated into Postgres by users. Who wants that overhead in a production system? However, I think when the same SQL query produces different results when you add an index, speaks of an inconsistency in the system, which could be the source of other problems. I have seen a couple posts where results from an index scan are not the same as the results from a table scan, granted they were language issues, but still, my gut tells me if I set the length of a variable to x, and a trailing zero is included, the system should either fail consistently or work consistently. I don't care which, it should just be consistent. Inconsistent behavior indicates that a different matching algorithm is used if one uses an index instead of a table scan. That scares me. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
Re: [HACKERS] Like vs '=' bug with indexing
Hannu Krosing [EMAIL PROTECTED] writes: Is there no simple (i.e. cheap) way to disallow \0 alltogether for these types then ? perhaps just strip them out in textin() (or is it text_in()) ? They *are* stripped out in textin(), by virtue of the fact that textin expects to see a null-terminated input string. It wouldn't surprise me too much if there are inconsistent behaviors between indexscans and seqscans for such invalid data. should'nt they both use the _same_ strcoll() and friends ? Irrelevant; the issue is that the various comparison operators may produce inconsistent results given invalid input. For instance texteq() short-circuits to a FALSE result if the lengths of the inputs are different, which means that 'ab\0' = 'ab' will produce false, even though a strcoll-based comparison will claim they are equal. I don't think that means that texteq() is wrong to check the lengths first. Should we not examine "the _possible_ outputs of every C-coded function to make sure it produces a valid value of the datatype" ;) Go for it. Possibly chr() should reject chr(0) ... regards, tom lane
Re: [HACKERS] Like vs '=' bug with indexing
m w [EMAIL PROTECTED] writes: Inconsistent behavior indicates that a different matching algorithm is used if one uses an index instead of a table scan. That scares me. A seq scan and an index scan are inherently different algorithms, so I don't see exactly how you think we can avoid this risk. In particular, if you are dealing with a btree index and a "WHERE column = constant" query, then a seq scan is only going to be concerned with the behavior of the '=' operator --- does it return TRUE or not for any particular row? But an index search is inherently going to make ordered comparisons (, =, ). So there is always a potential for inconsistent behavior if the ordering operators produce results that are inconsistent with simple '='. We cannot design that away --- all we can do is fix such bugs when one is discovered in a particular datatype. regards, tom lane
Re: [HACKERS] Like vs '=' bug with indexing
Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: Is there no simple (i.e. cheap) way to disallow \0 alltogether for these types then ? perhaps just strip them out in textin() (or is it text_in()) ? They *are* stripped out in textin(), by virtue of the fact that textin expects to see a null-terminated input string. Ok, I was mistaken to think that pg_trigger.tgargs contained real \0's and not fakes and I was able to get similar output from other char types by using \\000 - You never can tell how many are required to input a single \ to next level. Also I remember being told that bytea _can_ hold embedded \0, no ? Should we not examine "the _possible_ outputs of every C-coded function to make sure it produces a valid value of the datatype" ;) Go for it. Possibly chr() should reject chr(0) ... there is no function chr() at least in 7.0.2. and char is not usable from psql (gives out strange errors); hannu=# select char(0); ERROR: length for type 'bpchar' must be at least 1 hannu=# select char(32); ERROR: parser: parse error at or near ";" hannu=# select char('a'); ERROR: parser: parse error at or near "'" could it be possible somehow distinguish between user callable (safe) functions and "internal" ones ? Hannu
Re: [HACKERS] Like vs '=' bug with indexing
Hannu Krosing [EMAIL PROTECTED] writes: Also I remember being told that bytea _can_ hold embedded \0, no ? Yes. What has that got to do with text et al? Possibly chr() should reject chr(0) ... there is no function chr() at least in 7.0.2. I think it used to be called ichar(), but then someone pointed out that Oracle calls it chr(). and char is not usable from psql (gives out strange errors); char(n) is a type name, not a function call. regards, tom lane
Re: [HACKERS] Like vs '=' bug with indexing
m w [EMAIL PROTECTED] writes: Here is the problem: Depending on whether there is an index or not, 'like' behaves differently. Please provide a complete, self-contained example with which we can reproduce the problem. regards, tom lane
Re: [HACKERS] Like vs '=' bug with indexing
--- Tom Lane [EMAIL PROTECTED] wrote: m w [EMAIL PROTECTED] writes: Here is the problem: Depending on whether there is an index or not, 'like' behaves differently. Please provide a complete, self-contained example with which we can reproduce the problem. I am trying to create a test function and some data that reproduces the problem easily, but I think I know what it is. It is a two bug issue. I had a bug in my code in that I added the "\0" to the end of my strings when I converted from a C string to a postgres "text" object. I think this exposes a bug in postgres where either index or table scans (I'm not sure which just yet) treat a zero differently than a non-zero in a varchar. It looks as if some section of code is using the zero to terminate a string and another section of code is not. So at some point data which should be the same differes either on length of data, or the trailing zero compared to an uninitialized byte. Removing the terminating zero from the postgres string fixes the problem, but, if I understand postgres well enough, this should not make a difference, and should be reported as a bug anyway. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
Re: [HACKERS] Like vs '=' bug with indexing
m w [EMAIL PROTECTED] writes: I think this exposes a bug in postgres where either index or table scans (I'm not sure which just yet) treat a zero differently than a non-zero in a varchar. Embedded zeroes aren't supported in char/varchar/text fields, and cannot be supported in a portable fashion, since these datatypes rely on functions like strcoll() that don't allow embedded nulls in strings. It wouldn't surprise me too much if there are inconsistent behaviors between indexscans and seqscans for such invalid data. It doesn't seem real practical for us to examine the output of every C-coded function to make sure it produces a valid value of the datatype. Illegal returned values are a fault of the function, and ensuing misbehaviors are still its fault ... regards, tom lane