Joe Conway wrote:
> more work than I had time for when I wrote the current replace(). But as 
> I said, if there is support for getting this into the backend, I'll add 
> it to my todo list:
> 
> - Create new backend function replace()
> - Either create new backend functions, or merge into existing functions: 
> to_hex() and extract_tok()
> 

I'm just starting to take a look at this again. While studying the 
current text_substr() function I found two behaviors which conflict with 
specific SQL92/SQL99 requirements, and one bug. First the spec 
compliance -- SQL92 section 6.7/SQL99 section 6.18 say:

If <character substring function> is specified, then:
a) Let C be the value of the <character value expression>, let LC be the
    length of C, and let S be the value of the <start position>.
b) If <string length> is specified, then let L be the value of <string
    length> and let E be S+L. Otherwise, let E be the larger of LC + 1
    and S.
c) If either C, S, or L is the null value, then the result of the
    <character substring function> is the null value.
d) If E is less than S, then an exception condition is raised: data
    exception-substring error.
e) Case:
    i) If S is greater than LC or if E is less than 1, then the result of
       the <character substring function> is a zero-length string.
   ii) Otherwise,
       1) Let SI be the larger of S and 1. Let El be the smaller of E and
          LC+l. Let Ll be El-Sl.
       2) The result of the <character substring function> is a character
          string containing the Ll characters of C starting at character
          number Sl in the same order that the characters appear in C.

The only way for d) to be true is when L < 0. Instead of an error, we do:
test=# select substr('hello',2,-1);
  substr
--------
  ello
(1 row)

The other spec issue is wrt para e)i). If E (=S+L) < 1, we should return 
a zero-length string. Currently I get:
test=# select substr('hello',-4,3);
  substr
--------
  hello
(1 row)

Neither behavior is documented (unless it's somewhere other than:
http://developer.postgresql.org/docs/postgres/functions-string.html ).

The bug is this one:
test=# create DATABASE testmb with encoding = 'EUC_JP';
CREATE DATABASE
test=# \c testmb
You are now connected to database testmb.
testmb=# select substr('hello',6,2);
  substr
--------
  ~
(1 row)

testmb=# \c test
You are now connected to database test.
test=# select substr('hello',6,2);
  substr
--------

(1 row)

The multibyte database behavior is the bug. The SQL_ASCII behavior is 
correct (zero-length string):
test=# select substr('hello',6,2) is null;
  ?column?
----------
  f
(1 row)


Any objection if I rework this function to meet SQL92 and fix the bug? 
Or is the SQL92 part not desirable because it breaks backward 
compatability?

In any case, can the #ifdef MULTIBYTE's be removed now in favor of a 
test for encoding max length?

Joe


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Reply via email to