On Tue, 17 Oct 2023 at 20:39, David Rowley <dgrowle...@gmail.com> wrote:
>
> On Mon, 16 Oct 2023 at 05:56, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > I guess the next question is whether we want to stop here or
> > try to relax the requirement about NUL-termination.  I'd be inclined
> > to call that a separate issue deserving a separate commit, so maybe
> > we should go ahead and commit this much anyway.
>
> I am keen to see this relaxed. I agree that a separate effort is best.

I looked at the latest posted patch again today with thoughts about
pushing it but there's something I'm a bit unhappy with that makes me
think we should maybe do the NUL-termination relaxation in the same
commit.

The problem is in LogicalRepApplyLoop() the current patch adjusts the
manual building of the StringInfoData to make use of
initReadOnlyStringInfo() instead. The problem I have with that is that
the string that's given to initReadOnlyStringInfo() comes from
walrcv_receive() and on looking at the API spec for walrcv_receive_fn
I see:

/*
 * walrcv_receive_fn
 *
 * Receive a message available from the WAL stream.  'buffer' is a pointer
 * to a buffer holding the message received.  Returns the length of the data,
 * 0 if no data is available yet ('wait_fd' is a socket descriptor which can
 * be waited on before a retry), and -1 if the cluster ended the COPY.
 */

i.e, no mention that the buffer will be NUL terminated upon return.

Looking at pqGetCopyData3(), is see the buffer does get NUL
terminated, but without the API spec mentioning this I'm not feeling
good about going ahead with wrapping that up in
initReadOnlyStringInfo() which Asserts the buffer will be NUL
terminated.

I've attached a patch which builds on the previous patch and relaxes
the rule that the StringInfo must be NUL-terminated.   The rule is
only relaxed for StringInfos that are initialized with
initReadOnlyStringInfo.  On working on this I went over the locations
where we've added code to add a '\0' char to the buffer.  If you look
at, for example, record_recv() and array_agg_deserialize() in master,
we modify the StringInfo's data to set a \0 at the end of the string.
I've removed that code as I *believe* this isn't required for the
type's receive function.

There's also an existing confusing comment in logicalrep_read_tuple()
which seems to think we're just setting the NUL terminator to conform
to StringInfo's practises. This is misleading as the NUL is required
for LOGICALREP_COLUMN_TEXT mode as we use the type's input function
instead of the receive function.  You don't have to look very hard to
find an input function that needs a NUL terminator.

I'm a bit less confident that the type's receive function will never
need to be NUL terminated. cstring_recv() came to mind as one I should
look at, but on looking I see it's not required as it just reads the
remaining bytes from the input StringInfo.  Is it safe to assume this?
or could there be some UDF receive function which requires this?

David
diff --git a/src/backend/replication/logical/applyparallelworker.c 
b/src/backend/replication/logical/applyparallelworker.c
index 82f48a488e..9b37736f8e 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -774,10 +774,7 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
                        if (len == 0)
                                elog(ERROR, "invalid message length");
 
-                       s.cursor = 0;
-                       s.maxlen = -1;
-                       s.data = (char *) data;
-                       s.len = len;
+                       initReadOnlyStringInfo(&s, data, len);
 
                        /*
                         * The first byte of messages sent from leader apply 
worker to
diff --git a/src/backend/replication/logical/proto.c 
b/src/backend/replication/logical/proto.c
index d52c8963eb..4eaeb14148 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -879,6 +879,7 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
        /* Read the data */
        for (i = 0; i < natts; i++)
        {
+               char       *buff;
                char            kind;
                int                     len;
                StringInfo      value = &tuple->colvalues[i];
@@ -899,19 +900,18 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
                                len = pq_getmsgint(in, 4);      /* read length 
*/
 
                                /* and data */
-                               value->data = palloc(len + 1);
-                               pq_copymsgbytes(in, value->data, len);
+                               buff = palloc(len + 1);
+                               pq_copymsgbytes(in, buff, len);
 
                                /*
-                                * Not strictly necessary for 
LOGICALREP_COLUMN_BINARY, but
-                                * per StringInfo practice.
+                                * NUL termination is required for 
LOGICALREP_COLUMN_TEXT mode
+                                * as input function require that.  For
+                                * LOGICALREP_COLUMN_BINARY it's not 
technically required, but
+                                * it's harmless.
                                 */
-                               value->data[len] = '\0';
+                               buff[len] = '\0';
 
-                               /* make StringInfo fully valid */
-                               value->len = len;
-                               value->cursor = 0;
-                               value->maxlen = len;
+                               initStringInfoFromString(value, buff, len);
                                break;
                        default:
                                elog(ERROR, "unrecognized data representation 
type '%c'", kind);
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 54c14495be..567485c4a4 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3582,10 +3582,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
                                        /* Ensure we are reading the data into 
our memory context. */
                                        
MemoryContextSwitchTo(ApplyMessageContext);
 
-                                       s.data = buf;
-                                       s.len = len;
-                                       s.cursor = 0;
-                                       s.maxlen = -1;
+                                       initReadOnlyStringInfo(&s, buf, len);
 
                                        c = pq_getmsgbyte(&s);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c900427ecf..cb68b8efaf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1817,23 +1817,18 @@ exec_bind_message(StringInfo input_message)
 
                        if (!isNull)
                        {
-                               const char *pvalue = 
pq_getmsgbytes(input_message, plength);
+                               char       *pvalue;
 
                                /*
-                                * Rather than copying data around, we just set 
up a phony
+                                * Rather than copying data around, we just 
initialize a
                                 * StringInfo pointing to the correct portion 
of the message
-                                * buffer.  We assume we can scribble on the 
message buffer so
-                                * as to maintain the convention that 
StringInfos have a
-                                * trailing null.  This is grotty but is a big 
win when
-                                * dealing with very large parameter strings.
+                                * buffer.  We must NUL terminate the string 
for the sake of
+                                * the input function call below.
                                 */
-                               pbuf.data = unconstify(char *, pvalue);
-                               pbuf.maxlen = plength + 1;
-                               pbuf.len = plength;
-                               pbuf.cursor = 0;
-
-                               csave = pbuf.data[plength];
-                               pbuf.data[plength] = '\0';
+                               pvalue = unconstify(char *, 
pq_getmsgbytes(input_message, plength));
+                               csave = pvalue[plength];
+                               pvalue[plength] = '\0';
+                               initReadOnlyStringInfo(&pbuf, pvalue, plength);
                        }
                        else
                        {
diff --git a/src/backend/utils/adt/array_userfuncs.c 
b/src/backend/utils/adt/array_userfuncs.c
index 5c4fdcfba4..c831a9395c 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -784,7 +784,6 @@ array_agg_deserialize(PG_FUNCTION_ARGS)
                {
                        int                     itemlen;
                        StringInfoData elem_buf;
-                       char            csave;
 
                        if (result->dnulls[i])
                        {
@@ -799,28 +798,19 @@ array_agg_deserialize(PG_FUNCTION_ARGS)
                                                 errmsg("insufficient data left 
in message")));
 
                        /*
-                        * Rather than copying data around, we just set up a 
phony
-                        * StringInfo pointing to the correct portion of the 
input buffer.
-                        * We assume we can scribble on the input buffer so as 
to maintain
-                        * the convention that StringInfos have a trailing null.
+                        * Rather than copying data around, we just initialize a
+                        * StringInfo pointing to the correct portion of the 
message
+                        * buffer.
                         */
-                       elem_buf.data = &buf.data[buf.cursor];
-                       elem_buf.maxlen = itemlen + 1;
-                       elem_buf.len = itemlen;
-                       elem_buf.cursor = 0;
+                       initReadOnlyStringInfo(&elem_buf, 
&buf.data[buf.cursor], itemlen);
 
                        buf.cursor += itemlen;
 
-                       csave = buf.data[buf.cursor];
-                       buf.data[buf.cursor] = '\0';
-
                        /* Now call the element's receiveproc */
                        result->dvalues[i] = 
ReceiveFunctionCall(&iodata->typreceive,
                                                                                
                         &elem_buf,
                                                                                
                         iodata->typioparam,
                                                                                
                         -1);
-
-                       buf.data[buf.cursor] = csave;
                }
        }
 
diff --git a/src/backend/utils/adt/arrayfuncs.c 
b/src/backend/utils/adt/arrayfuncs.c
index 7828a6264b..6a920a02b7 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1475,7 +1475,6 @@ ReadArrayBinary(StringInfo buf,
        {
                int                     itemlen;
                StringInfoData elem_buf;
-               char            csave;
 
                /* Get and check the item length */
                itemlen = pq_getmsgint(buf, 4);
@@ -1494,21 +1493,13 @@ ReadArrayBinary(StringInfo buf,
                }
 
                /*
-                * Rather than copying data around, we just set up a phony 
StringInfo
-                * pointing to the correct portion of the input buffer. We 
assume we
-                * can scribble on the input buffer so as to maintain the 
convention
-                * that StringInfos have a trailing null.
+                * Rather than copying data around, we just initialize a 
StringInfo
+                * pointing to the correct portion of the message buffer.
                 */
-               elem_buf.data = &buf->data[buf->cursor];
-               elem_buf.maxlen = itemlen + 1;
-               elem_buf.len = itemlen;
-               elem_buf.cursor = 0;
+               initReadOnlyStringInfo(&elem_buf, &buf->data[buf->cursor], 
itemlen);
 
                buf->cursor += itemlen;
 
-               csave = buf->data[buf->cursor];
-               buf->data[buf->cursor] = '\0';
-
                /* Now call the element's receiveproc */
                values[i] = ReceiveFunctionCall(receiveproc, &elem_buf,
                                                                                
typioparam, typmod);
@@ -1520,8 +1511,6 @@ ReadArrayBinary(StringInfo buf,
                                        
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
                                         errmsg("improper binary format in 
array element %d",
                                                        i + 1)));
-
-               buf->data[buf->cursor] = csave;
        }
 
        /*
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index ad176651d8..eb8fe95933 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -569,7 +569,6 @@ record_recv(PG_FUNCTION_ARGS)
                int                     itemlen;
                StringInfoData item_buf;
                StringInfo      bufptr;
-               char            csave;
 
                /* Ignore dropped columns in datatype, but fill with nulls */
                if (att->attisdropped)
@@ -619,25 +618,19 @@ record_recv(PG_FUNCTION_ARGS)
                        /* -1 length means NULL */
                        bufptr = NULL;
                        nulls[i] = true;
-                       csave = 0;                      /* keep compiler quiet 
*/
                }
                else
                {
+                       char       *strbuff;
+
                        /*
-                        * Rather than copying data around, we just set up a 
phony
-                        * StringInfo pointing to the correct portion of the 
input buffer.
-                        * We assume we can scribble on the input buffer so as 
to maintain
-                        * the convention that StringInfos have a trailing null.
+                        * Rather than copying data around, we just initialize a
+                        * StringInfo pointing to the correct portion of the 
message
+                        * buffer.
                         */
-                       item_buf.data = &buf->data[buf->cursor];
-                       item_buf.maxlen = itemlen + 1;
-                       item_buf.len = itemlen;
-                       item_buf.cursor = 0;
-
+                       strbuff = &buf->data[buf->cursor];
                        buf->cursor += itemlen;
-
-                       csave = buf->data[buf->cursor];
-                       buf->data[buf->cursor] = '\0';
+                       initReadOnlyStringInfo(&item_buf, strbuff, itemlen);
 
                        bufptr = &item_buf;
                        nulls[i] = false;
@@ -667,8 +660,6 @@ record_recv(PG_FUNCTION_ARGS)
                                                
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
                                                 errmsg("improper binary format 
in record column %d",
                                                                i + 1)));
-
-                       buf->data[buf->cursor] = csave;
                }
        }
 
diff --git a/src/common/stringinfo.c b/src/common/stringinfo.c
index 05b22b5c53..dc97754a68 100644
--- a/src/common/stringinfo.c
+++ b/src/common/stringinfo.c
@@ -70,10 +70,16 @@ initStringInfo(StringInfo str)
  *
  * Reset the StringInfo: the data buffer remains valid, but its
  * previous content, if any, is cleared.
+ *
+ * Read-only StringInfos as initialized by initReadOnlyStringInfo cannot be
+ * reset.
  */
 void
 resetStringInfo(StringInfo str)
 {
+       /* don't allow resets of read-only StringInfos */
+       Assert(str->maxlen != 0);
+
        str->data[0] = '\0';
        str->len = 0;
        str->cursor = 0;
@@ -284,6 +290,9 @@ enlargeStringInfo(StringInfo str, int needed)
 {
        int                     newlen;
 
+       /* validate this is not a read-only StringInfo */
+       Assert(str->maxlen != 0);
+
        /*
         * Guard against out-of-range "needed" values.  Without this, we can get
         * an overflow or infinite loop in the following.
diff --git a/src/include/lib/stringinfo.h b/src/include/lib/stringinfo.h
index 36a416f8e0..019129a6d3 100644
--- a/src/include/lib/stringinfo.h
+++ b/src/include/lib/stringinfo.h
@@ -20,17 +20,25 @@
 
 /*-------------------------
  * StringInfoData holds information about an extensible string.
- *             data    is the current buffer for the string (allocated with 
palloc).
- *             len             is the current string length.  There is 
guaranteed to be
- *                             a terminating '\0' at data[len], although this 
is not very
- *                             useful when the string holds binary data rather 
than text.
+ *             data    is the current buffer for the string.
+ *             len             is the current string length.  Except in the 
case of read-only
+ *                             strings described below, there is guaranteed to 
be a
+ *                             terminating '\0' at data[len].
  *             maxlen  is the allocated size in bytes of 'data', i.e. the 
maximum
  *                             string size (including the terminating '\0' 
char) that we can
  *                             currently store in 'data' without having to 
reallocate
- *                             more space.  We must always have maxlen > len.
- *             cursor  is initialized to zero by makeStringInfo or 
initStringInfo,
- *                             but is not otherwise touched by the 
stringinfo.c routines.
- *                             Some routines use it to scan through a 
StringInfo.
+ *                             more space.  We must always have maxlen > len, 
except
+ *                             in the read-only case described below.
+ *             cursor  is initialized to zero by makeStringInfo, 
initStringInfo,
+ *                             initReadOnlyStringInfo and 
initStringInfoFromString but is not
+ *                             otherwise touched by the stringinfo.c routines. 
 Some routines
+ *                             use it to scan through a StringInfo.
+ *
+ * As a special case, a StringInfoData can be initialized with a read-only
+ * string buffer.  In this case "data" does not necessarily point at a
+ * palloc'd chunk, and management of the buffer storage is the caller's
+ * responsibility.  maxlen is set to zero to indicate that this is the case.
+ * Read-only StringInfoDatas cannot be appended to or reset.
  *-------------------------
  */
 typedef struct StringInfoData
@@ -45,7 +53,7 @@ typedef StringInfoData *StringInfo;
 
 
 /*------------------------
- * There are two ways to create a StringInfo object initially:
+ * There are four ways to create a StringInfo object initially:
  *
  * StringInfo stringptr = makeStringInfo();
  *             Both the StringInfoData and the data buffer are palloc'd.
@@ -56,8 +64,31 @@ typedef StringInfoData *StringInfo;
  *             This is the easiest approach for a StringInfo object that will
  *             only live as long as the current routine.
  *
+ * StringInfoData string;
+ * initReadOnlyStringInfo(&string, existingbuf, len);
+ *             The StringInfoData's data field is set to point directly to the
+ *             existing buffer and the StringInfoData's len is set to the 
given len.
+ *             The given buffer can point to memory that's not managed by 
palloc or
+ *             is pointing partway through a palloc'd chunk.  The maxlen field 
is set
+ *             to 0.  A read-only StringInfo cannot be appended to using any 
of the
+ *             appendStringInfo functions or reset with resetStringInfo().  
The given
+ *             buffer can optionally omit the NUL termination character.
+ *
+ * StringInfoData string;
+ * initStringInfoFromString(&string, palloced_buf, len);
+ *             The StringInfoData's data field is set to point directly to the 
given
+ *             buffer and the StringInfoData's len is set to the given len.  
This
+ *             method of initialization is useful when the buffer already 
exists.
+ *             StringInfos initialized this way can be appended to using the
+ *             appendStringInfo functions and reset with resetStringInfo().  
The
+ *             given buffer must be NUL-terminated.  The palloc'd buffer is 
assumed
+ *             to be len + 1 in size.
+ *
  * To destroy a StringInfo, pfree() the data buffer, and then pfree() the
  * StringInfoData if it was palloc'd.  There's no special support for this.
+ * However, if the StringInfo was initialized using initReadOnlyStringInfo()
+ * then the caller will need to consider if it is safe to pfree the data
+ * buffer.
  *
  * NOTE: some routines build up a string using StringInfo, and then
  * release the StringInfoData but return the data string itself to their
@@ -79,6 +110,48 @@ extern StringInfo makeStringInfo(void);
  */
 extern void initStringInfo(StringInfo str);
 
+/*------------------------
+ * initReadOnlyStringInfo
+ * Initialize a StringInfoData struct from an existing string without copying
+ * the string.  The caller is responsible for ensuring the given string
+ * remains valid as long as the StringInfoData does.  Calls to this are used
+ * in performance critical locations where allocating a new buffer and copying
+ * would be too costly.  Read-only StringInfoData's may not be appended to
+ * using any of the appendStringInfo functions or reset with
+ * resetStringInfo().
+ *
+ * 'data' does not need to point directly to a palloc'd chunk of memory and may
+ * omit the NUL termination character at data[len].
+ */
+static inline void
+initReadOnlyStringInfo(StringInfo str, char *data, int len)
+{
+       str->data = data;
+       str->len = len;
+       str->maxlen = 0;                        /* read-only */
+       str->cursor = 0;
+}
+
+/*------------------------
+ * initStringInfoFromString
+ * Initialize a StringInfoData struct from an existing string without copying
+ * the string.  'data' must be a valid palloc'd chunk of memory that can have
+ * repalloc() called should more space be required during a call to any of the
+ * appendStringInfo functions.
+ *
+ * 'data' must be NUL terminated at 'len' bytes.
+ */
+static inline void
+initStringInfoFromString(StringInfo str, char *data, int len)
+{
+       Assert(data[len] == '\0');
+
+       str->data = data;
+       str->len = len;
+       str->maxlen = len + 1;
+       str->cursor = 0;
+}
+
 /*------------------------
  * resetStringInfo
  * Clears the current content of the StringInfo, if any. The

Reply via email to