On Fri, Aug 16, 2019 at 10:48 PM Binguo Bao <djydew...@gmail.com> wrote:
> [v8 patch with cosmetic changes]

Okay, looks good. I'll make a few style suggestions and corrections.
In the course of looking at this again, I have a few other questions
below as well.

It looks like you already do this for the most part, but I'll mention
that we try to keep lines, including comments, less than 80 characters
long. pgindent can try to fix that, but the results don't always look
nice.

About variable names: The iterator pointers are variously called
"iter", "iterator", and "fetch_iter". I found this confusing the first
time I read this code. I think we should use "iter" if we have only
one kind in the function, and "detoast_iter" and "fetch_iter" if we
have both kinds.
--

init_detoast_iterator():

+ * The "iterator" variable is normally just a local variable in the caller.

I don't think this comment is helpful to understand this function or its use.

+ * It only make sense to initialize de-TOAST iterator for external
on-disk value.

s/make/makes/
"a" de-TOAST iterator
s/value/values/

The comments in this function that start with "This is a ..." could be
shortened like this:

/* indirect pointer -- dereference it */

While looking at this again, I noticed we no longer need to test for
the in-line compressed case at all. I also tried some other cosmetic
rearrangements. Let me know what you think about the attached patch.
Also, I wonder if the VARATT_IS_EXTERNAL_INDIRECT case should come
first. Then the two normal cases are next to eachother.


free_detoast_iterator(), free_fetch_datum_iterator(), and free_toast_buffer():

These functions should return void.

+ * Free the memory space occupied by the de-TOAST iterator include buffers and
+ * fetch datum iterator.

Perhaps "Free memory used by the de-TOAST iterator, including buffers
and fetch datum iterator."

The check

if (iter->buf != iter->fetch_datum_iterator->buf)

is what we need to do for the compressed case. Could we use this
directly instead of having a separate state variable iter->compressed,
with a macro like this?

#define TOAST_ITER_COMPRESSED(iter) \
    (iter->buf != iter->fetch_datum_iterator->buf)

Or maybe that's too clever?


detoast_iterate():

+ * As long as there is another data chunk in compression or external storage,

We no longer use the iterator with in-line compressed values.

+ * de-TOAST it into toast buffer in iterator.

Maybe "into the iterator's toast buffer"


fetch_datum_iterate():

My remarks for detoast_iterate() also apply here.


init_toast_buffer():

+ * Note the constrain buf->position <= buf->limit may be broken
+ * at initialization. Make sure that the constrain is satisfied
+ * when consume chars.

s/constrain/constraint/ (2 times)
s/consume/consuming/

Also, this comment might be better at the top the whole function?


pglz_decompress_iterate():

+ * Decompresses source into dest until the source is exhausted.

This comment is from the original function, but I think it would be
better to highlight the differences from the original, something like:

"This function is based on pglz_decompress(), with these additional
requirements:

1. We need to save the current control byte and byte position for the
caller's next iteration.

2. In pglz_decompress(), we can assume we have all the source bytes
available. This is not the case when we decompress one chunk at a
time, so we have to make sure that we only read bytes available in the
current chunk."

(I'm not sure about the term 'byte position', maybe there's a better one.)

+ * In the while loop, sp may go beyond the srcend, provides a four-byte
+ * buffer to prevent sp from reading unallocated bytes from source buffer.
+ * When source->limit reaches source->capacity, don't worry about reading
+ * unallocated bytes.

Here's my suggestion:

"In the while loop, sp may be incremented such that it points beyond
srcend. To guard against reading beyond the end of the current chunk,
we set srcend such that we exit the loop when we are within four bytes
of the end of the current chunk. When source->limit reaches
source->capacity, we are decompressing the last chunk, so we can (and
need to) read every byte."

+ for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)

Note you can also replace 8 with INVALID_CTRLC here.

tuptoaster.h:
+ * Constrains that need to be satisfied:

s/constrains/constraints/

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8

I think the macro might be better placed in pg_lzcompress.h, and for
consistency used in pglz_decompress(). Then the comment can be shorter
and more general. With my additional comment in
init_detoast_iterator(), hopefully it will be clear to readers.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 878f8aff01..d0eeb83ab7 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -366,6 +366,8 @@ bool init_detoast_iterator(struct varlena *attr, DetoastIterator iterator)
 	struct varatt_external toast_pointer;
 	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
+		iterator->done = false;
+
 		/*
 		 * This is an externally stored datum --- initialize fetch datum iterator
 		 */
@@ -373,21 +375,22 @@ bool init_detoast_iterator(struct varlena *attr, DetoastIterator iterator)
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 		{
-			/* If it's compressed, prepare buffer for raw data */
+			iterator->compressed = true;
+
+			/* prepare buffer to received decompressed data */
 			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
 			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+
+			/* initialize state for pglz_decompress_iterate() */
 			iterator->ctrl = 0;
 			iterator->ctrlc = INVALID_CTRLC;
-			iterator->compressed = true;
-			iterator->done = false;
 		}
 		else
 		{
-			iterator->buf = iterator->fetch_datum_iterator->buf;
-			iterator->ctrl = 0;
-			iterator->ctrlc = INVALID_CTRLC;
 			iterator->compressed = false;
-			iterator->done = false;
+
+			/* point the buffer directly at the raw data */
+			iterator->buf = iterator->fetch_datum_iterator->buf;
 		}
 		return true;
 	}
@@ -408,16 +411,11 @@ bool init_detoast_iterator(struct varlena *attr, DetoastIterator iterator)
 		return init_detoast_iterator(attr, iterator);
 
 	}
-	else if (VARATT_IS_COMPRESSED(attr))
+	else
 	{
-		/*
-		 * This is a compressed value inside of the main tuple
-		 * skip the iterator and just decompress the whole thing.
-		 */
+		/* in-line value -- no iteration used, even if it's compressed */
 		return false;
 	}
-
-	return false;
 }
 
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 4688313076..1752f4f9ec 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1097,9 +1097,11 @@ textpos(PG_FUNCTION_ARGS)
 	text		*str;
 	DetoastIteratorData iteratorData;
 	DetoastIterator iter = &iteratorData;
+	struct varlena *attr = (struct varlena *)
+								DatumGetPointer(PG_GETARG_DATUM(0));
 	text	   *search_str = PG_GETARG_TEXT_PP(1);
 
-	if (init_detoast_iterator((struct varlena *) (DatumGetPointer(PG_GETARG_DATUM(0))), iter))
+	if (init_detoast_iterator(attr, iter))
 	{
 		str = (text *) iter->buf->buf;
 	}

Reply via email to