Alvaro Herrera <alvhe...@2ndquadrant.com> 于2019年9月4日周三 上午4:12写道:

> > +static void
> > +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed)
> > +{
> > +     buf->buf = (const char *) palloc0(size);
>
> This API is weird -- you always palloc the ToastBuffer first, then call
> init_toast_bufer on it.  Why not palloc the ToastBuffer struct in
> init_toast_buffer and return it from there instead?  This is
> particularly strange since the ToastBuffer itself is freed by the "free"
> routine ... so it's not like we're thinking that caller can take
> ownership of the struct by embedding it in a larger struct.


I agree with you. I also change "init_detoast_iterator" to
"create_detoast_iterator"
so the caller doesn't need to manage the memory allocation of the iterator


> Also, this function needs a comment on top explaining what it does and
> what the params are.
>

Done.


> Why do we need ToastBuffer->buf_size?  Seems unused.
>
> > +     if (iter == NULL)
> > +     {
> > +             return;
> > +     }
>

Removed.


> Please, no braces around single-statement blocks.  (Many places).
>

Done.


> > +/*
> > + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> > + * the field is invalid and need to read the control byte from the
> > + * source buffer in the next iteration, see pglz_decompress_iterate().
> > + */
> > +#define INVALID_CTRLC 8
>
> What does CTRLC stand for?  Also: this comment should explain why the
> value 8 is what it is.
>

I've improved the comment.


>
> > +                             /*
> > +                              * Now we copy the bytes specified by the
> tag from OUTPUT to
> > +                              * OUTPUT. It is dangerous and platform
> dependent to use
> > +                              * memcpy() here, because the copied areas
> could overlap
> > +                              * extremely!
> > +                              */
> > +                             len = Min(len, destend - dp);
> > +                             while (len--)
> > +                             {
> > +                                     *dp = dp[-off];
> > +                                     dp++;
> > +                             }
>
> So why not use memmove?
>
> > +                             /*
> > +                              * Otherwise it contains the match length
> minus 3 and the
> > +                              * upper 4 bits of the offset. The next
> following byte
> > +                              * contains the lower 8 bits of the
> offset. If the length is
> > +                              * coded as 18, another extension tag byte
> tells how much
> > +                              * longer the match really was (0-255).
> > +                              */
> > +                             int32           len;
> > +                             int32           off;
> > +
> > +                             len = (sp[0] & 0x0f) + 3;
> > +                             off = ((sp[0] & 0xf0) << 4) | sp[1];
> > +                             sp += 2;
> > +                             if (len == 18)
> > +                                     len += *sp++;
>
> Starting this para with "Otherwise" makes no sense, since there's no
> previous opposite case.  Please reword.  However, I don't recognize this
> code from anywhere, and it seems to have a lot of magical numbers.  Is
> this code completely new?
>

This function is based on pglz_decompress() in src/common/pg_lzcompress.c
and I've
mentioned that in the function's comment at the beginning.


> Didn't much like FetchDatumIteratorData SnapshotToast struct member
> name.  How about just "snapshot"?
>

Done.

> +#define PG_DETOAST_ITERATE(iter, need)
>                                      \
> > +     do {
>                                                               \
> > +             Assert(need >= iter->buf->buf && need <=
> iter->buf->capacity);  \
> > +             while (!iter->done && need >= iter->buf->limit) {
>                      \
> > +                     detoast_iterate(iter);
>                                               \
> > +             }
>                                                                      \
> > +     } while (0)
>
> This needs parens around each "iter" and "need" in the macro definition.
> Also, please add a comment documenting what the arguments are, since
> it's not immediately obvious.
>

Parens makes the macro more reliable. Done.

> +void free_detoast_iterator(DetoastIterator iter)
> > +{
> > +     if (iter == NULL)
> > +     {
> > +             return;
> > +     }
>
> If this function is going to do this, why do callers need to check for
> NULL also?  Seems pointless.  I'd rather make callers simpler and keep
> only the NULL-check inside the function, since this is not perf-critical
> anyway.
>

Good catch. Done.

 > +             iter->fetch_datum_iterator =
create_fetch_datum_iterator(attr);

> > +             VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> > +             if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> > +             {
> > [...]
> > +             }
> > +             else
> > +             {
> > +                     iter->compressed = false;
> > +
> > +                     /* point the buffer directly at the raw data */
> > +                     iter->buf = iter->fetch_datum_iterator->buf;
> > +             }
>
> This arrangement where there are two ToastBuffers and they sometimes are
> the same is cute, but I think we need a better way to know when each
> needs to be freed afterwards;
>

We only need to check the "compressed" field in the iterator to figure out
which buffer should be freed.

-- 
Best regards,
Binguo Bao
From 90b8ee9af981a3d7a36edb275f7c8c7d3fdebfb6 Mon Sep 17 00:00:00 2001
From: BBG <djydew...@gmail.com>
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/common/detoast.c         | 103 ++++++++
 src/backend/access/common/toast_internals.c | 355 ++++++++++++++++++++++++++++
 src/backend/utils/adt/varlena.c             |  29 ++-
 src/include/access/detoast.h                |  97 ++++++++
 src/include/access/toast_internals.h        |  10 +
 src/include/fmgr.h                          |  13 +
 6 files changed, 601 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index c8b49d6..905deac 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -290,6 +290,109 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 }
 
 /* ----------
+ * create_detoast_iterator -
+ *
+ * It only makes sense to initialize a de-TOAST iterator for external on-disk values.
+ *
+ * ----------
+ */
+DetoastIterator
+create_detoast_iterator(struct varlena *attr)
+{
+	struct varatt_external toast_pointer;
+	DetoastIterator iter;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		iter = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iter->done = false;
+
+		/* This is an externally stored datum --- initialize fetch datum iterator */
+		iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			iter->compressed = true;
+
+			/* prepare buffer to received decompressed data */
+			iter->buf = create_toast_buffer(toast_pointer.va_rawsize, false);
+
+			/* initialize state for pglz_decompress_iterate() */
+			iter->ctrl = 0;
+			iter->ctrlc = INVALID_CTRLC;
+		}
+		else
+		{
+			iter->compressed = false;
+
+			/* point the buffer directly at the raw data */
+			iter->buf = iter->fetch_datum_iterator->buf;
+		}
+		return iter;
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/* indirect pointer --- dereference it */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		return create_detoast_iterator(attr);
+
+	}
+	else
+		/* in-line value -- no iteration used, even if it's compressed */
+		return NULL;
+}
+
+/* ----------
+ * free_detoast_iterator -
+ *
+ * Free memory used by the de-TOAST iterator, including buffers and
+ * fetch datum iterator.
+ * ----------
+ */
+void
+free_detoast_iterator(DetoastIterator iter)
+{
+	if (iter == NULL)
+		return;
+	if (iter->compressed)
+		free_toast_buffer(iter->buf);
+	free_fetch_datum_iterator(iter->fetch_datum_iterator);
+	pfree(iter);
+}
+
+/* ----------
+ * detoast_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another data chunk in external storage,
+ * de-TOAST it into iterator's toast buffer.
+ * ----------
+ */
+void
+detoast_iterate(DetoastIterator detoast_iter)
+{
+	FetchDatumIterator fetch_iter = detoast_iter->fetch_datum_iterator;
+
+	Assert(detoast_iter != NULL && !detoast_iter->done);
+
+	fetch_datum_iterate(fetch_iter);
+
+	if (detoast_iter->compressed)
+		pglz_decompress_iterate(fetch_iter->buf, detoast_iter->buf, detoast_iter);
+
+	if (detoast_iter->buf->limit == detoast_iter->buf->capacity)
+		detoast_iter->done = true;
+}
+
+/* ----------
  * toast_fetch_datum -
  *
  *	Reconstruct an in memory Datum from the chunks saved
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index a971242..ac8ae77 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -630,3 +630,358 @@ init_toast_snapshot(Snapshot toast_snapshot)
 
 	InitToastSnapshot(*toast_snapshot, snapshot->lsn, snapshot->whenTaken);
 }
+
+/* ----------
+ * create_fetch_datum_iterator -
+ *
+ * Initialize fetch datum iterator.
+ * ----------
+ */
+FetchDatumIterator
+create_fetch_datum_iterator(struct varlena *attr)
+{
+	int			validIndex;
+	FetchDatumIterator iter;
+
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+		elog(ERROR, "create_fetch_datum_iterator shouldn't be called for non-ondisk datums");
+
+	iter = (FetchDatumIterator) palloc0(sizeof(FetchDatumIteratorData));
+
+	/* Must copy to access aligned fields */
+	VARATT_EXTERNAL_GET_POINTER(iter->toast_pointer, attr);
+
+	iter->ressize = iter->toast_pointer.va_extsize;
+	iter->numchunks = ((iter->ressize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
+
+	/*
+	 * Open the toast relation and its indexes
+	 */
+	iter->toastrel = table_open(iter->toast_pointer.va_toastrelid, AccessShareLock);
+
+	/* Look for the valid index of the toast relation */
+	validIndex = toast_open_indexes(iter->toastrel,
+									AccessShareLock,
+									&iter->toastidxs,
+									&iter->num_indexes);
+
+	/*
+	 * Setup a scan key to fetch from the index by va_valueid
+	 */
+	ScanKeyInit(&iter->toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(iter->toast_pointer.va_valueid));
+
+	/*
+	 * Read the chunks by index
+	 *
+	 * Note that because the index is actually on (valueid, chunkidx) we will
+	 * see the chunks in chunkidx order, even though we didn't explicitly ask
+	 * for it.
+	 */
+
+	init_toast_snapshot(&iter->snapshot);
+	iter->toastscan = systable_beginscan_ordered(iter->toastrel, iter->toastidxs[validIndex],
+												 &iter->snapshot, 1, &iter->toastkey);
+
+	iter->buf = create_toast_buffer(iter->ressize + VARHDRSZ,
+		VARATT_EXTERNAL_IS_COMPRESSED(iter->toast_pointer));
+
+	iter->nextidx = 0;
+	iter->done = false;
+
+	return iter;
+}
+
+void
+free_fetch_datum_iterator(FetchDatumIterator iter)
+{
+	if (iter == NULL)
+		return;
+
+	if (!iter->done)
+	{
+		systable_endscan_ordered(iter->toastscan);
+		toast_close_indexes(iter->toastidxs, iter->num_indexes, AccessShareLock);
+		table_close(iter->toastrel, AccessShareLock);
+	}
+	free_toast_buffer(iter->buf);
+	pfree(iter);
+}
+
+/* ----------
+ * fetch_datum_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another chunk data in external storage,
+ * fetch it into iterator's toast buffer.
+ * ----------
+ */
+void
+fetch_datum_iterate(FetchDatumIterator iter)
+{
+	HeapTuple	ttup;
+	TupleDesc	toasttupDesc;
+	int32		residx;
+	Pointer		chunk;
+	bool		isnull;
+	char		*chunkdata;
+	int32		chunksize;
+
+	Assert(iter != NULL && !iter->done);
+
+	ttup = systable_getnext_ordered(iter->toastscan, ForwardScanDirection);
+	if (ttup == NULL)
+	{
+		/*
+		 * Final checks that we successfully fetched the datum
+		 */
+		if (iter->nextidx != iter->numchunks)
+			elog(ERROR, "missing chunk number %d for toast value %u in %s",
+				 iter->nextidx,
+				 iter->toast_pointer.va_valueid,
+				 RelationGetRelationName(iter->toastrel));
+
+		/*
+		 * End scan and close relations
+		 */
+		systable_endscan_ordered(iter->toastscan);
+		toast_close_indexes(iter->toastidxs, iter->num_indexes, AccessShareLock);
+		table_close(iter->toastrel, AccessShareLock);
+
+		iter->done = true;
+		return;
+	}
+
+	/*
+	 * Have a chunk, extract the sequence number and the data
+	 */
+	toasttupDesc = iter->toastrel->rd_att;
+	residx = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, &isnull));
+	Assert(!isnull);
+	chunk = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, &isnull));
+	Assert(!isnull);
+	if (!VARATT_IS_EXTENDED(chunk))
+	{
+		chunksize = VARSIZE(chunk) - VARHDRSZ;
+		chunkdata = VARDATA(chunk);
+	}
+	else if (VARATT_IS_SHORT(chunk))
+	{
+		/* could happen due to heap_form_tuple doing its thing */
+		chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+		chunkdata = VARDATA_SHORT(chunk);
+	}
+	else
+	{
+		/* should never happen */
+		elog(ERROR, "found toasted toast chunk for toast value %u in %s",
+			 iter->toast_pointer.va_valueid,
+			 RelationGetRelationName(iter->toastrel));
+		chunksize = 0;		/* keep compiler quiet */
+		chunkdata = NULL;
+	}
+
+	/*
+	 * Some checks on the data we've found
+	 */
+	if (residx != iter->nextidx)
+		elog(ERROR, "unexpected chunk number %d (expected %d) for toast value %u in %s",
+			 residx, iter->nextidx,
+			 iter->toast_pointer.va_valueid,
+			 RelationGetRelationName(iter->toastrel));
+	if (residx < iter->numchunks - 1)
+	{
+		if (chunksize != TOAST_MAX_CHUNK_SIZE)
+			elog(ERROR, "unexpected chunk size %d (expected %d) in chunk %d of %d for toast value %u in %s",
+				 chunksize, (int) TOAST_MAX_CHUNK_SIZE,
+				 residx, iter->numchunks,
+				 iter->toast_pointer.va_valueid,
+				 RelationGetRelationName(iter->toastrel));
+	}
+	else if (residx == iter->numchunks - 1)
+	{
+		if ((residx * TOAST_MAX_CHUNK_SIZE + chunksize) != iter->ressize)
+			elog(ERROR, "unexpected chunk size %d (expected %d) in final chunk %d for toast value %u in %s",
+				 chunksize,
+				 (int) (iter->ressize - residx * TOAST_MAX_CHUNK_SIZE),
+				 residx,
+				 iter->toast_pointer.va_valueid,
+				 RelationGetRelationName(iter->toastrel));
+	}
+	else
+		elog(ERROR, "unexpected chunk number %d (out of range %d..%d) for toast value %u in %s",
+			 residx,
+			 0, iter->numchunks - 1,
+			 iter->toast_pointer.va_valueid,
+			 RelationGetRelationName(iter->toastrel));
+
+	/*
+	 * Copy the data into proper place in our iterator buffer
+	 */
+	memcpy(iter->buf->limit, chunkdata, chunksize);
+	iter->buf->limit += chunksize;
+
+	iter->nextidx++;
+}
+
+/* ----------
+ * create_toast_buffer -
+ *
+ * Create and initialize a TOAST buffer.
+ *
+ * size: buffer size include header
+ * compressed: whether TOAST value is compressed
+ * ----------
+ */
+ToastBuffer *
+create_toast_buffer(int32 size, bool compressed)
+{
+	ToastBuffer *buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+	buf->buf = (const char *) palloc0(size);
+	if (compressed) {
+		SET_VARSIZE_COMPRESSED(buf->buf, size);
+		/*
+		 * Note the constraint buf->position <= buf->limit may be broken
+		 * at initialization. Make sure that the constraint is satisfied
+		 * when consuming chars.
+		 */
+		buf->position = VARDATA_4B_C(buf->buf);
+	}
+	else
+	{
+		SET_VARSIZE(buf->buf, size);
+		buf->position = VARDATA_4B(buf->buf);
+	}
+	buf->limit = VARDATA(buf->buf);
+	buf->capacity = buf->buf + size;
+
+	return buf;
+}
+
+void
+free_toast_buffer(ToastBuffer *buf)
+{
+	if (buf == NULL)
+		return;
+
+	pfree((void *)buf->buf);
+	pfree(buf);
+}
+
+/* ----------
+ * pglz_decompress_iterate -
+ *
+ * This function is based on pglz_decompress(), with these additional
+ * requirements:
+ *
+ * 1. We need to save the current control byte and byte position for the
+ * caller's next iteration.
+ *
+ * 2. In pglz_decompress(), we can assume we have all the source bytes
+ * available. This is not the case when we decompress one chunk at a
+ * time, so we have to make sure that we only read bytes available in the
+ * current chunk.
+ * ----------
+ */
+void
+pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest, DetoastIterator iter)
+{
+	const unsigned char *sp;
+	const unsigned char *srcend;
+	unsigned char *dp;
+	unsigned char *destend;
+
+	/*
+	 * In the while loop, sp may be incremented such that it points beyond
+	 * srcend. To guard against reading beyond the end of the current chunk,
+	 * we set srcend such that we exit the loop when we are within four bytes
+	 * of the end of the current chunk. When source->limit reaches
+	 * source->capacity, we are decompressing the last chunk, so we can (and
+	 * need to) read every byte.
+	 */
+	srcend = (const unsigned char *)
+		(source->limit == source->capacity ? source->limit : (source->limit - 4));
+	sp = (const unsigned char *) source->position;
+	dp = (unsigned char *) dest->limit;
+	destend = (unsigned char *) dest->capacity;
+
+	while (sp < srcend && dp < destend)
+	{
+		/*
+		 * Read one control byte and process the next 8 items (or as many as
+		 * remain in the compressed input).
+		 */
+		unsigned char ctrl;
+		int			ctrlc;
+
+		if (iter->ctrlc != INVALID_CTRLC)
+		{
+			ctrl = iter->ctrl;
+			ctrlc = iter->ctrlc;
+		}
+		else
+		{
+			ctrl = *sp++;
+			ctrlc = 0;
+		}
+
+
+		for (; ctrlc < INVALID_CTRLC && sp < srcend && dp < destend; ctrlc++)
+		{
+
+			if (ctrl & 1)
+			{
+				/*
+				 * Otherwise it contains the match length minus 3 and the
+				 * upper 4 bits of the offset. The next following byte
+				 * contains the lower 8 bits of the offset. If the length is
+				 * coded as 18, another extension tag byte tells how much
+				 * longer the match really was (0-255).
+				 */
+				int32		len;
+				int32		off;
+
+				len = (sp[0] & 0x0f) + 3;
+				off = ((sp[0] & 0xf0) << 4) | sp[1];
+				sp += 2;
+				if (len == 18)
+					len += *sp++;
+
+				/*
+				 * Now we copy the bytes specified by the tag from OUTPUT to
+				 * OUTPUT. It is dangerous and platform dependent to use
+				 * memcpy() here, because the copied areas could overlap
+				 * extremely!
+				 */
+				len = Min(len, destend - dp);
+				while (len--)
+				{
+					*dp = dp[-off];
+					dp++;
+				}
+			}
+			else
+			{
+				/*
+				 * An unset control bit means LITERAL BYTE. So we just copy
+				 * one from INPUT to OUTPUT.
+				 */
+				*dp++ = *sp++;
+			}
+
+			/*
+			 * Advance the control bit
+			 */
+			ctrl >>= 1;
+		}
+
+		iter->ctrlc = ctrlc;
+		iter->ctrl = ctrl;
+	}
+
+	source->position = (char *) sp;
+	dest->limit = (char *) dp;
+}
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d36156f..e93870b 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -56,6 +56,8 @@ typedef struct
 	int			len1;			/* string lengths in bytes */
 	int			len2;
 
+	DetoastIterator iter;
+
 	/* Skip table for Boyer-Moore-Horspool search algorithm: */
 	int			skiptablemask;	/* mask for ANDing with skiptable subscripts */
 	int			skiptable[256]; /* skip distance for given mismatched char */
@@ -122,7 +124,7 @@ static text *text_substring(Datum str,
 							int32 length,
 							bool length_not_specified);
 static text *text_overlay(text *t1, text *t2, int sp, int sl);
-static int	text_position(text *t1, text *t2, Oid collid);
+static int	text_position(text *t1, text *t2, Oid collid, DetoastIterator iter);
 static void text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state);
 static bool text_position_next(TextPositionState *state);
 static char *text_position_next_internal(char *start_ptr, TextPositionState *state);
@@ -1092,10 +1094,18 @@ text_overlay(text *t1, text *t2, int sp, int sl)
 Datum
 textpos(PG_FUNCTION_ARGS)
 {
-	text	   *str = PG_GETARG_TEXT_PP(0);
+	text		*str;
+	struct varlena *attr = (struct varlena *)
+								DatumGetPointer(PG_GETARG_DATUM(0));
 	text	   *search_str = PG_GETARG_TEXT_PP(1);
+	DetoastIterator iter = create_detoast_iterator(attr);
+
+	if (iter != NULL)
+		str = (text *) iter->buf->buf;
+	else
+		str = PG_GETARG_TEXT_PP(0);
 
-	PG_RETURN_INT32((int32) text_position(str, search_str, PG_GET_COLLATION()));
+	PG_RETURN_INT32((int32) text_position(str, search_str, PG_GET_COLLATION(), iter));
 }
 
 /*
@@ -1113,7 +1123,7 @@ textpos(PG_FUNCTION_ARGS)
  *	functions.
  */
 static int
-text_position(text *t1, text *t2, Oid collid)
+text_position(text *t1, text *t2, Oid collid, DetoastIterator iter)
 {
 	TextPositionState state;
 	int			result;
@@ -1122,6 +1132,7 @@ text_position(text *t1, text *t2, Oid collid)
 		return 0;
 
 	text_position_setup(t1, t2, collid, &state);
+	state.iter = iter;
 	if (!text_position_next(&state))
 		result = 0;
 	else
@@ -1130,7 +1141,6 @@ text_position(text *t1, text *t2, Oid collid)
 	return result;
 }
 
-
 /*
  * text_position_setup, text_position_next, text_position_cleanup -
  *	Component steps of text_position()
@@ -1196,6 +1206,7 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 	state->str2 = VARDATA_ANY(t2);
 	state->len1 = len1;
 	state->len2 = len2;
+	state->iter = NULL;
 	state->last_match = NULL;
 	state->refpoint = state->str1;
 	state->refpos = 0;
@@ -1358,6 +1369,9 @@ text_position_next_internal(char *start_ptr, TextPositionState *state)
 		hptr = start_ptr;
 		while (hptr < haystack_end)
 		{
+			if (state->iter != NULL)
+				PG_DETOAST_ITERATE(state->iter, hptr);
+
 			if (*hptr == nchar)
 				return (char *) hptr;
 			hptr++;
@@ -1375,6 +1389,9 @@ text_position_next_internal(char *start_ptr, TextPositionState *state)
 			const char *nptr;
 			const char *p;
 
+			if (state->iter != NULL)
+				PG_DETOAST_ITERATE(state->iter, hptr);
+
 			nptr = needle_last;
 			p = hptr;
 			while (*nptr == *p)
@@ -1438,7 +1455,7 @@ text_position_get_match_pos(TextPositionState *state)
 static void
 text_position_cleanup(TextPositionState *state)
 {
-	/* no cleanup needed */
+	free_detoast_iterator(state->iter);
 }
 
 static void
diff --git a/src/include/access/detoast.h b/src/include/access/detoast.h
index 02029a9..0aa38eb 100644
--- a/src/include/access/detoast.h
+++ b/src/include/access/detoast.h
@@ -73,12 +73,109 @@ extern struct varlena *heap_tuple_untoast_attr_slice(struct varlena *attr,
 							  int32 sliceoffset,
 							  int32 slicelength);
 
+#ifndef FRONTEND
+#include "access/genam.h"
+
+/*
+ * TOAST buffer is a producer consumer buffer.
+ *
+ *    +--+--+--+--+--+--+--+--+--+--+--+--+--+
+ *    |  |  |  |  |  |  |  |  |  |  |  |  |  |
+ *    +--+--+--+--+--+--+--+--+--+--+--+--+--+
+ *    ^           ^           ^              ^
+ *   buf      position      limit         capacity
+ *
+ * buf: point to the start of buffer.
+ * position: point to the next char to be consumed.
+ * limit: point to the next char to be produced.
+ * capacity: point to the end of buffer.
+ *
+ * Constraints that need to be satisfied:
+ * buf <= position <= limit <= capacity
+ */
+typedef struct ToastBuffer
+{
+	const char	*buf;
+	const char	*position;
+	char		*limit;
+	const char	*capacity;
+} ToastBuffer;
+
+typedef struct FetchDatumIteratorData
+{
+	ToastBuffer	*buf;
+	Relation	toastrel;
+	Relation	*toastidxs;
+	SysScanDesc	toastscan;
+	ScanKeyData	toastkey;
+	SnapshotData			snapshot;
+	struct varatt_external	toast_pointer;
+	int32		ressize;
+	int32		nextidx;
+	int32		numchunks;
+	int			num_indexes;
+	bool		done;
+}				FetchDatumIteratorData;
+
+typedef struct FetchDatumIteratorData *FetchDatumIterator;
+
+/*
+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8
+
+typedef struct DetoastIteratorData
+{
+	ToastBuffer 		*buf;
+	FetchDatumIterator	fetch_datum_iterator;
+	unsigned char		ctrl;
+	int					ctrlc;
+	bool				compressed;		/* toast value is compressed? */
+	bool				done;
+}			DetoastIteratorData;
+
+typedef struct DetoastIteratorData *DetoastIterator;
+
+/* ----------
+ * create_detoast_iterator -
+ *
+ * It only makes sense to initialize a de-TOAST iterator for external on-disk values.
+ *
+ * ----------
+ */
+extern DetoastIterator create_detoast_iterator(struct varlena *attr);
+
+/* ----------
+ * free_detoast_iterator -
+ *
+ * Free memory used by the de-TOAST iterator, including buffers and
+ * fetch datum iterator.
+ * ----------
+ */
+extern void free_detoast_iterator(DetoastIterator iter);
+
+/* ----------
+ * detoast_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another data chunk in external storage,
+ * de-TOAST it into iterator's toast buffer.
+ * ----------
+ */
+extern void detoast_iterate(DetoastIterator detoast_iter);
+
+#endif
+
 /* ----------
  * toast_raw_datum_size -
  *
  *	Return the raw (detoasted) size of a varlena datum
  * ----------
  */
+
 extern Size toast_raw_datum_size(Datum value);
 
 /* ----------
diff --git a/src/include/access/toast_internals.h b/src/include/access/toast_internals.h
index 494b07a..18c7000 100644
--- a/src/include/access/toast_internals.h
+++ b/src/include/access/toast_internals.h
@@ -15,6 +15,7 @@
 #include "storage/lockdefs.h"
 #include "utils/relcache.h"
 #include "utils/snapshot.h"
+#include "detoast.h"
 
 /*
  *	The information at the start of the compressed toast data.
@@ -51,4 +52,13 @@ extern void toast_close_indexes(Relation *toastidxs, int num_indexes,
 								LOCKMODE lock);
 extern void init_toast_snapshot(Snapshot toast_snapshot);
 
+
+extern FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+extern void free_fetch_datum_iterator(FetchDatumIterator iter);
+extern void fetch_datum_iterate(FetchDatumIterator iter);
+extern ToastBuffer *create_toast_buffer(int32 size, bool compressed);
+extern void free_toast_buffer(ToastBuffer *buf);
+extern void pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+									DetoastIterator iter);
+
 #endif							/* TOAST_INTERNALS_H */
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 29ae467..6ceb6bd 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -239,6 +239,19 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena *datum);
 #define PG_DETOAST_DATUM_SLICE(datum,f,c) \
 		pg_detoast_datum_slice((struct varlena *) DatumGetPointer(datum), \
 		(int32) (f), (int32) (c))
+
+/*
+ * Support for de-TOASTing toasted value iteratively. "need" is a pointer
+ * between the beginning and end of iterator's ToastBuffer. The marco
+ * de-TOAST all bytes before "need" into iterator's ToastBuffer.
+ */
+#define PG_DETOAST_ITERATE(iter, need)											\
+	do {																		\
+		Assert((need) >= (iter)->buf->buf && (need) <= (iter)->buf->capacity);	\
+		while (!(iter)->done && (need) >= (iter)->buf->limit) { 				\
+			detoast_iterate(iter);												\
+		}																		\
+	} while (0)
 /* WARNING -- unaligned pointer */
 #define PG_DETOAST_DATUM_PACKED(datum) \
 	pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))
-- 
2.7.4

Reply via email to