On Tue, Feb 27, 2024 at 05:20:23PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/16/functions-string.html
> Description:
> 
> Hey,
> 
> I was confused by substring behavior today, when giving 0 as a start
> position. I understand now that string indices are 1-based, have a certain
> flexibility about where to start (allowing negative start positions), and
> that this is defined in the standard SQL spec.
> 
> I'm comfy with all this, but I think it'd be nice to have a hint in the pg
> substring docs for nonpositive start positions, so that users don't have to
> have paid for the standard SQL spec to get past this. To me, substring seems
> like a relatively common function with relatively surprising behavior.

I dug into this and quickly became as confused as you were.  The best
explanation I found of the current behavior is here (with diagram):

        https://www.mssqltips.com/sqlservertutorial/9374/sql-substring-function/
        SELECT SUBSTRING('Hello world',-2,5) as msg

The last Postgres community discussion of this behavior I could find was
from 2007:

        
https://www.postgresql.org/message-id/flat/12803.1168804636%40sss.pgh.pa.us#8316fb2298c9e49f77867a1ae2ead447

This web page explains the feature:

        
https://stackoverflow.com/questions/33462061/sql-server-substring-position-negative-value

but also asks:

        now the only question that remains is, "why would anyone need it
        to behave this way?"

and the answer given is:

        @mao47 Well, it depends. I am not an author of implementation of
        SUBSTR but I guess with negative index it behaves like LEFT(string,
        LEN(string) - 1 - index). It works the same way in PostgreSQL so maybe
        it is SQL standard.

Informix has substring() which matches the SQL standard, and substr()
which uses negative start from the end of the string:

        
https://www.ibm.com/docs/en/informix-servers/14.10?topic=smf-substr-function

Oracle doesn't have substring(), just substr(), and matches Informix
behavior, I think.

I have developed the attached doc patch to document this.  The only
question is whether this substring behavior is so odd that we should not
document it.

-- 
  Bruce Momjian  <br...@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e5fa82c161..45553ab824 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -2783,6 +2783,13 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
         specified.  Provide at least one of <parameter>start</parameter>
         and <parameter>count</parameter>.
        </para>
+       <para>
+        Non-positive <parameter>start</parameter> values specify a
+        start position before the first character, and therefore the
+        <parameter>count</parameter> must be two greater than the absolute
+        value of <parameter>start</parameter> to begin returning characters
+        from the start of the string.
+       </para>
        <para>
         <literal>substring('Thomas' from 2 for 3)</literal>
         <returnvalue>hom</returnvalue>
@@ -2794,6 +2801,10 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
        <para>
         <literal>substring('Thomas' for 2)</literal>
         <returnvalue>Th</returnvalue>
+       </para>
+       <para>
+        <literal>substring('Thomas' from -4 for 8)</literal>
+        <returnvalue>Tho</returnvalue>
        </para></entry>
       </row>
 

Reply via email to