Joe Conway wrote:
> more work than I had time for when I wrote the current replace(). But as
> I said, if there is support for getting this into the backend, I'll add
> it to my todo list:
>
> - Create new backend function replace()
> - Either create new backend functions, or merge into existing functions:
> to_hex() and extract_tok()
>
I'm just starting to take a look at this again. While studying the
current text_substr() function I found two behaviors which conflict with
specific SQL92/SQL99 requirements, and one bug. First the spec
compliance -- SQL92 section 6.7/SQL99 section 6.18 say:
If <character substring function> is specified, then:
a) Let C be the value of the <character value expression>, let LC be the
length of C, and let S be the value of the <start position>.
b) If <string length> is specified, then let L be the value of <string
length> and let E be S+L. Otherwise, let E be the larger of LC + 1
and S.
c) If either C, S, or L is the null value, then the result of the
<character substring function> is the null value.
d) If E is less than S, then an exception condition is raised: data
exception-substring error.
e) Case:
i) If S is greater than LC or if E is less than 1, then the result of
the <character substring function> is a zero-length string.
ii) Otherwise,
1) Let SI be the larger of S and 1. Let El be the smaller of E and
LC+l. Let Ll be El-Sl.
2) The result of the <character substring function> is a character
string containing the Ll characters of C starting at character
number Sl in the same order that the characters appear in C.
The only way for d) to be true is when L < 0. Instead of an error, we do:
test=# select substr('hello',2,-1);
substr
--------
ello
(1 row)
The other spec issue is wrt para e)i). If E (=S+L) < 1, we should return
a zero-length string. Currently I get:
test=# select substr('hello',-4,3);
substr
--------
hello
(1 row)
Neither behavior is documented (unless it's somewhere other than:
http://developer.postgresql.org/docs/postgres/functions-string.html ).
The bug is this one:
test=# create DATABASE testmb with encoding = 'EUC_JP';
CREATE DATABASE
test=# \c testmb
You are now connected to database testmb.
testmb=# select substr('hello',6,2);
substr
--------
~
(1 row)
testmb=# \c test
You are now connected to database test.
test=# select substr('hello',6,2);
substr
--------
(1 row)
The multibyte database behavior is the bug. The SQL_ASCII behavior is
correct (zero-length string):
test=# select substr('hello',6,2) is null;
?column?
----------
f
(1 row)
Any objection if I rework this function to meet SQL92 and fix the bug?
Or is the SQL92 part not desirable because it breaks backward
compatability?
In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
test for encoding max length?
Joe
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly