Re: [proposal] de-TOAST'ing using a iterator

2019-09-23 Thread Binguo Bao
Alvaro Herrera  于2019年9月17日周二 上午5:51写道:

> On 2019-Sep-10, Binguo Bao wrote:
>
> > +/*
> > + * Support for de-TOASTing toasted value iteratively. "need" is a
> pointer
> > + * between the beginning and end of iterator's ToastBuffer. The marco
> > + * de-TOAST all bytes before "need" into iterator's ToastBuffer.
> > + */
> > +#define PG_DETOAST_ITERATE(iter, need)
>  \
> > + do {
>
>   \
> > + Assert((need) >= (iter)->buf->buf && (need) <=
> (iter)->buf->capacity);  \
> > + while (!(iter)->done && (need) >= (iter)->buf->limit) {
>  \
> > + detoast_iterate(iter);
>   \
> > + }
>
>  \
> > + } while (0)
> >  /* WARNING -- unaligned pointer */
> >  #define PG_DETOAST_DATUM_PACKED(datum) \
> >   pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))
>
> In broad terms this patch looks pretty good to me.  I only have a small
> quibble with this API definition in fmgr.h -- namely that it forces us
> to export the definition of all the structs (that could otherwise be
> private to toast_internals.h) in order to satisfy callers of this macro.
> I am wondering if it would be possible to change detoast_iterate and
> PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
> thinking something like "if this returns NULL, then iteration has
> finished"; and relieve the macro from doing the "->buf->buf" and
> "->buf->limit" checks.  I think changing that would require a change in
> how the rest of the code is structured around this (the textpos internal
> function), but seems like it would be better overall.
>
> (AFAICS that would enable us to expose much less about the
> iterator-related structs to detoast.h -- you should be able to move the
> struct defs to toast_internals.h)
>
> Then again, it might be just wishful thinking, but it seems worth
> considering at least.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

I've tied to hide the details of the struct in patch v11 with checking
"need" pointer
inside detoast_iterate function. I also compared the performance of the two
versions.

 patch v10  patch v11
comp. beg.1413ms  1489ms
comp. end   24327ms28011ms
uncomp. beg. 1439ms  1432ms
uncomp. end 25019ms    29007ms

We can see that v11 is about 15% slower than v10 on suffix queries since
this involves
the complete de-TOASTing and detoast_iterate() function is called
frequently in v11.

Personally, I prefer patch v10. Its performance is superior, although it
exposes some struct details.

-- 
Best regards,
Binguo Bao
From 8f6381d272816175d3e681c9d1cd8c6b5f27f44f Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/common/detoast.c | 110 +
 src/backend/access/common/toast_internals.c | 355 
 src/backend/utils/adt/varlena.c |  29 ++-
 src/include/access/detoast.h|  32 +++
 src/include/access/toast_internals.h|  75 ++
 src/include/fmgr.h  |   8 +
 6 files changed, 603 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index c8b49d6..f437d27 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -290,6 +290,116 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 }
 
 /* --
+ * create_detoast_iterator -
+ *
+ * It only makes sense to initialize a de-TOAST iterator for external on-disk values.
+ *
+ * --
+ */
+DetoastIterator
+create_detoast_iterator(struct varlena *attr)
+{
+	struct varatt_external toast_pointer;
+	DetoastIterator iter;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		iter = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iter->done = false;
+
+		/* This is an externally stored datum --- initialize fetch datum iterator */
+		iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			iter->compressed = true;
+
+			/* prepare buffer to received decompressed data */
+			iter->buf = create_toast_buffer(toast_pointer.va_rawsize, false);
+
+			/* initialize state for pglz_decompress_iterate() */
+			iter->ctrl = 0;
+			it

Re: [proposal] de-TOAST'ing using a iterator

2019-09-10 Thread Binguo Bao
ens makes the macro more reliable. Done.

> +void free_detoast_iterator(DetoastIterator iter)
> > +{
> > + if (iter == NULL)
> > + {
> > + return;
> > + }
>
> If this function is going to do this, why do callers need to check for
> NULL also?  Seems pointless.  I'd rather make callers simpler and keep
> only the NULL-check inside the function, since this is not perf-critical
> anyway.
>

Good catch. Done.

 > + iter->fetch_datum_iterator =
create_fetch_datum_iterator(attr);

> > + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> > + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> > + {
> > [...]
> > + }
> > + else
> > + {
> > + iter->compressed = false;
> > +
> > + /* point the buffer directly at the raw data */
> > + iter->buf = iter->fetch_datum_iterator->buf;
> > + }
>
> This arrangement where there are two ToastBuffers and they sometimes are
> the same is cute, but I think we need a better way to know when each
> needs to be freed afterwards;
>

We only need to check the "compressed" field in the iterator to figure out
which buffer should be freed.

-- 
Best regards,
Binguo Bao
From 90b8ee9af981a3d7a36edb275f7c8c7d3fdebfb6 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/common/detoast.c | 103 
 src/backend/access/common/toast_internals.c | 355 
 src/backend/utils/adt/varlena.c |  29 ++-
 src/include/access/detoast.h|  97 
 src/include/access/toast_internals.h|  10 +
 src/include/fmgr.h  |  13 +
 6 files changed, 601 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index c8b49d6..905deac 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -290,6 +290,109 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 }
 
 /* --
+ * create_detoast_iterator -
+ *
+ * It only makes sense to initialize a de-TOAST iterator for external on-disk values.
+ *
+ * --
+ */
+DetoastIterator
+create_detoast_iterator(struct varlena *attr)
+{
+	struct varatt_external toast_pointer;
+	DetoastIterator iter;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		iter = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iter->done = false;
+
+		/* This is an externally stored datum --- initialize fetch datum iterator */
+		iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			iter->compressed = true;
+
+			/* prepare buffer to received decompressed data */
+			iter->buf = create_toast_buffer(toast_pointer.va_rawsize, false);
+
+			/* initialize state for pglz_decompress_iterate() */
+			iter->ctrl = 0;
+			iter->ctrlc = INVALID_CTRLC;
+		}
+		else
+		{
+			iter->compressed = false;
+
+			/* point the buffer directly at the raw data */
+			iter->buf = iter->fetch_datum_iterator->buf;
+		}
+		return iter;
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/* indirect pointer --- dereference it */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		return create_detoast_iterator(attr);
+
+	}
+	else
+		/* in-line value -- no iteration used, even if it's compressed */
+		return NULL;
+}
+
+/* --
+ * free_detoast_iterator -
+ *
+ * Free memory used by the de-TOAST iterator, including buffers and
+ * fetch datum iterator.
+ * --
+ */
+void
+free_detoast_iterator(DetoastIterator iter)
+{
+	if (iter == NULL)
+		return;
+	if (iter->compressed)
+		free_toast_buffer(iter->buf);
+	free_fetch_datum_iterator(iter->fetch_datum_iterator);
+	pfree(iter);
+}
+
+/* --
+ * detoast_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another data chunk in external storage,
+ * de-TOAST it into iterator's toast buffer.
+ * --
+ */
+void
+detoast_iterate(DetoastIterator detoast_iter)
+{
+	FetchDatumIterator fetch_iter = detoast_iter->fetch_datum_iterator;
+
+	Assert(detoast_iter != NULL && !detoast_iter->done);
+
+	fetch_datum_iterate(fetch_iter);
+
+	if (detoast_iter->compressed)
+		pglz_decompress_iterate(fetch_iter->buf, detoast_iter->buf, detoast_iter);
+

Re: [proposal] de-TOAST'ing using a iterator

2019-08-21 Thread Binguo Bao
John Naylor  于2019年8月19日周一 下午12:55写道:

> init_toast_buffer():
>
> + * Note the constrain buf->position <= buf->limit may be broken
> + * at initialization. Make sure that the constrain is satisfied
> + * when consume chars.
>
> s/constrain/constraint/ (2 times)
> s/consume/consuming/
>
> Also, this comment might be better at the top the whole function?
>

The constraint is broken in the if branch, so I think put this comment in
the branch
is more precise.

The check
> if (iter->buf != iter->fetch_datum_iterator->buf)
> is what we need to do for the compressed case. Could we use this
> directly instead of having a separate state variable iter->compressed,
> with a macro like this?
> #define TOAST_ITER_COMPRESSED(iter) \
> (iter->buf != iter->fetch_datum_iterator->buf)


 The logic of the macro may be hard to understand, so I think it's ok to
just check the compressed state variable.

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> + * the field is invalid and need to read the control byte from the
> + * source buffer in the next iteration, see pglz_decompress_iterate().
> + */
> +#define INVALID_CTRLC 8
>
> I think the macro might be better placed in pg_lzcompress.h, and for
> consistency used in pglz_decompress(). Then the comment can be shorter
> and more general. With my additional comment in
> init_detoast_iterator(), hopefully it will be clear to readers.
>

The main role of this macro is to explain the iterator's "ctrlc" state, IMO
it's reasonable to put
the macro and definition of de-TOAST iterator together.

Thanks for your suggestion, I have updated the patch.
-- 
Best regards,
Binguo Bao
From d8d14600e2be0c11f233bf3302a823a8c4f19c2a Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 472 +++
 src/backend/utils/adt/varlena.c  |  42 +++-
 src/include/access/tuptoaster.h  |  97 +++
 src/include/fmgr.h   |   7 +
 4 files changed, 612 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 74233bb..ba05fd1 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static void free_fetch_datum_iterator(FetchDatumIterator iter);
+static void fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static void free_toast_buffer(ToastBuffer *buf);
+static void pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	DetoastIterator iter);
 
 
 /* --
@@ -347,6 +354,119 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * init_detoast_iterator -
+ *
+ * It only makes sense to initialize a de-TOAST iterator for external on-disk values.
+ *
+ * --
+ */
+bool init_detoast_iterator(struct varlena *attr, DetoastIterator iter)
+{
+	struct varatt_external toast_pointer;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		iter->done = false;
+
+		/*
+		 * This is an externally stored datum --- initialize fetch datum iterator
+		 */
+		iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			iter->compressed = true;
+
+			/* prepare buffer to received decompressed data */
+			iter->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iter->buf, toast_pointer.va_rawsize, false);
+
+			/* initialize state for pglz_decompress_iterate() */
+			iter->ctrl = 0;
+			iter->ctrlc = INVALID_CTRLC;
+		}
+		else
+		{
+			iter->compressed = false;
+
+			/* point the buffer directly at the raw data */
+			iter->buf = iter->fetch_datum_iterator->buf;
+		}
+		return true;
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		 /* indirect pointer --- dereference it */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		return init_detoast_iterator(attr, iter);
+
+	}
+	else
+	{
+		/* in-line value -- no iteration used, even if it's compressed */
+		return false;
+	}
+}
+
+
+/* --
+ * free_detoast_iterator -
+ *
+ * Free memory used by the de-TOAST iterator, in

Re: [proposal] de-TOAST'ing using a iterator

2019-08-17 Thread Binguo Bao
Hi John,

> >> Also, I don't think
> >> the new logic for the ctrl/c variables is an improvement:
> >>
> >> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
> >> which is confusing). Any time you initialize with something not 0 or
> >> 1, it's a magic number, and here it's far from where the loop variable
> >> is used. This is harder to read.
> >
> > `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress
> at the end of
> > the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid
> value is 0~7,
> > When `ctrlc` reaches 8,  a control byte is read from the source
> > buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be
> read from the
> > source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should
> be intialized with '8'.
>
> My point here is it looks strange out of context, but "0" looked
> normal. Maybe a comment in init_detoast_buffer(), something like "8
> means read a control byte from the source buffer on the first
> iteration, see pg_lzdecompress_iterate()".
>
> Or, possibly, we could have a macro like INVALID_CTRLC. That might
> even improve the readability of the original function. This is just an
> idea, and maybe others would disagree, so you don't need to change it
> for now.
>

All in all, the idea is much better than a magic number 8. So, I've
implemented it.


> At this point, there are no functional things that I think we need to
> change. It's close to ready-for-committer. For the next version, I'd
> like you go through the comments and edit for grammar, spelling, and
> clarity as you see fit. I know you're not a native speaker of English,
> so I can help you with anything that remains.


I've tried my best to improve the comments, but there should be room for
further improvement
I hope you can help me perfect it.


> Also note we use braces
> on their own lines
> {
> like this
> }
>
> Done.
-- 
Best regards,
Binguo Bao
From 13648adb56b96a75910bb5d0a8e21d358b266d51 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 470 +++
 src/backend/utils/adt/varlena.c  |  40 ++-
 src/include/access/tuptoaster.h  |  97 
 src/include/fmgr.h   |   7 +
 4 files changed, 608 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 74233bb..a8924f1 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static void fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static void pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	DetoastIterator iter);
 
 
 /* --
@@ -347,6 +354,125 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * init_detoast_iterator -
+ *
+ * The "iterator" variable is normally just a local variable in the caller.
+ * It only make sense to initialize de-TOAST iterator for external on-disk value.
+ *
+ * --
+ */
+bool init_detoast_iterator(struct varlena *attr, DetoastIterator iterator)
+{
+	struct varatt_external toast_pointer;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		/*
+		 * This is an externally stored datum --- initialize fetch datum iterator
+		 */
+		iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			/* If it's compressed, prepare buffer for raw data */
+			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+			iterator->ctrl = 0;
+			iterator->ctrlc = INVALID_CTRLC;
+			iterator->compressed = true;
+			iterator->done = false;
+		}
+		else
+		{
+			iterator->buf = iterator->fetch_datum_iterator->buf;
+			iterator->ctrl = 0;
+			iterator->ctrlc = INVALID_CTRLC;
+			iterator->compressed = false;
+			iterator->done = false;
+		}
+		return true;
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * This is an indirect pointer --- dereference it
+		 */
+		struct varatt_indirect redirect;
+
+		VARAT

Re: [proposal] de-TOAST'ing using a iterator

2019-08-03 Thread Binguo Bao
John Naylor  于2019年8月2日周五 下午3:12写道:

> On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao  wrote:
> >
> > John Naylor  于2019年7月29日周一 上午11:49写道:
> >>
> >> 1). For every needle comparison, text_position_next_internal()
> >> calculates how much of the value is needed and passes that to
> >> detoast_iterate(), which then calculates if it has to do something or
> >> not. This is a bit hard to follow. There might also be a performance
> >> penalty -- the following is just a theory, but it sounds plausible:
> >> The CPU can probably correctly predict that detoast_iterate() will
> >> usually return the same value it did last time, but it still has to
> >> call the function and make sure, which I imagine is more expensive
> >> than advancing the needle. Ideally, we want to call the iterator only
> >> if we have to.
> >>
> >> In the attached patch (applies on top of your v5),
> >> text_position_next_internal() simply compares hptr to the detoast
> >> buffer limit, and calls detoast_iterate() until it can proceed. I
> >> think this is clearer.
> >
> >
> > Yes, I think this is a general scenario where the caller continually
> > calls detoast_iterate until gets enough data, so I think such operations
> can
> > be extracted as a macro, as I did in patch v6. In the macro, the
> detoast_iterate
> > function is called only when the data requested by the caller is greater
> than the
> > buffer limit.
>
> I like the use of a macro here. However, I think we can find a better
> location for the definition. See the header comment of fmgr.h:
> "Definitions for the Postgres function manager and function-call
> interface." Maybe tuptoaster.h is as good a place as any?
>

PG_DETOAST_ITERATE isn't a sample function-call interface,
But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
similar to PG_DETOAST_ITERATE, make condition check first and then
decide whether to call the function. Besides, PG_DETOAST_DATUM,
PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE,
PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
to put all the de-TOAST interface together.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
> >> don't check it or do anything with it. Should we do something with it?
> >> It's also not yet clear if we should check the iterator state instead
> >> of return values. I've added some XXX comments as a reminder. We
> >> should also check the return value of pglz_decompress_iterate().
> >
> >
> > IMO, we need to provide users with a simple iterative interface.
> > Using the required data pointer to compare with the buffer limit is an
> easy way.
> > And the application scenarios of the iterator are mostly read operations.
> > So I think there is no need to return a value, and the iterator needs to
> throw an
> > exception for some wrong calls, such as all the data have been iterated,
> > but the user still calls the iterator.
>
> Okay, and see these functions now return void. The orignal
> pglz_decompress() returned a value that was check against corruption.
> Is there a similar check we can do for the iterative version?
>

As far as I know, we can just do such check after all compressed data is
decompressed.
If we are slicing, we can't do the check.


>
> >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
> >> pglz_decompress(), and I have some questions on it:
> >>
> >> a).
> >> + srcend = (const unsigned char *) (source->limit == source->capacity
> >> ? source->limit : (source->limit - 4));
> >>
> >> What does the 4 here mean in this expression?
> >
> >
> > Since we fetch chunks one by one, if we make srcend equals to the source
> buffer limit,
> > In the while loop "while (sp < srcend && dp < destend)", sp may exceed
> the source buffer limit and read unallocated bytes.
>
> Why is this? That tells me the limit is incorrect. Can the setter not
> determine the right value?
>

There are three statments change `sp` value in the while loop `while (sp <
srcend && dp < destend)`:
`ctrl = *sp++;`
`off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
`len += *sp++`
Although we make sure `sp` is less than `srcend` when enter while loop,
`sp` is likely to
go beyond the `srcend` in the loop, and we should ensure that `sp` is
always smaller than `buf->limit` to avoid
reading unallocated data. So, `srcend` can't be initialized to
`buf->limit`. Only one case is exceptional,
we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity',
it's

Re: [proposal] de-TOAST'ing using a iterator

2019-07-30 Thread Binguo Bao
John Naylor  于2019年7月29日周一 上午11:49写道:

> On Thu, Jul 25, 2019 at 10:21 PM Binguo Bao  wrote:
> My goal for this stage of review was to understand more fully what the
>
code is doing, and make it as simple and clear as possible, starting
> at the top level. In doing so, it looks like I found some additional
> performance gains. I haven't looked much yet at the TOAST fetching
> logic.
>
>
> 1). For every needle comparison, text_position_next_internal()
> calculates how much of the value is needed and passes that to
> detoast_iterate(), which then calculates if it has to do something or
> not. This is a bit hard to follow. There might also be a performance
> penalty -- the following is just a theory, but it sounds plausible:
> The CPU can probably correctly predict that detoast_iterate() will
> usually return the same value it did last time, but it still has to
> call the function and make sure, which I imagine is more expensive
> than advancing the needle. Ideally, we want to call the iterator only
> if we have to.
>
> In the attached patch (applies on top of your v5),
> text_position_next_internal() simply compares hptr to the detoast
> buffer limit, and calls detoast_iterate() until it can proceed. I
> think this is clearer.


Yes, I think this is a general scenario where the caller continually
calls detoast_iterate until gets enough data, so I think such operations can
be extracted as a macro, as I did in patch v6. In the macro, the
detoast_iterate
function is called only when the data requested by the caller is greater
than the
buffer limit.

(I'm not sure of the error handling, see #2.)
> In this scheme, the only reason to know length is to pass to
> pglz_decompress_iterate() in the case of in-line compression. As I
> alluded to in my first review, I don't think it's worth the complexity
> to handle that iteratively since the value is only a few kB. I made it
> so in-line datums are fully decompressed as in HEAD and removed struct
> members to match.


Sounds good. This not only simplifies the structure and logic of Detoast
Iterator
but also has no major impact on efficiency.


> I also noticed that no one updates or looks at
> "toast_iter.done" so I removed that as well.
>

toast_iter.done is updated when the buffer limit reached the buffer
capacity now.
So, I added it back.


> Now pglz_decompress_iterate() doesn't need length at all. For testing
> I just set decompress_all = true and let the compiler optimize away
> the rest. I left finishing it for you if you agree with these changes.
>

Done.


> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
> don't check it or do anything with it. Should we do something with it?
> It's also not yet clear if we should check the iterator state instead
> of return values. I've added some XXX comments as a reminder. We
> should also check the return value of pglz_decompress_iterate().
>

IMO, we need to provide users with a simple iterative interface.
Using the required data pointer to compare with the buffer limit is an easy
way.
And the application scenarios of the iterator are mostly read operations.
So I think there is no need to return a value, and the iterator needs to
throw an
exception for some wrong calls, such as all the data have been iterated,
but the user still calls the iterator.


>
> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
> pglz_decompress(), and I have some questions on it:
>
> a).
> + srcend = (const unsigned char *) (source->limit == source->capacity
> ? source->limit : (source->limit - 4));
>
> What does the 4 here mean in this expression?


Since we fetch chunks one by one, if we make srcend equals to the source
buffer limit,
In the while loop "while (sp < srcend && dp < destend)", sp may exceed the
source buffer limit and
read unallocated bytes. Giving a four-byte buffer can prevent sp from
exceeding the source buffer limit.
If we have read all the chunks, we don't need to be careful to cross the
border,
just make srcend equal to source buffer limit. I've added comments to
explain it in patch v6.



> Is it possible it's
> compensating for this bit in init_toast_buffer()?
>
> + buf->limit = VARDATA(buf->buf);
>
> It seems the initial limit should also depend on whether the datum is
> compressed, right? Can we just do this:
>
> + buf->limit = buf->position;
>

I'm afraid not. buf->position points to the data portion of the buffer, but
the beginning of
the chunks we read may contain header information. For example, for
compressed data chunks,
the first four bytes record the size of raw data, this means that limit is
four bytes ahead of position.
This initialization doesn't cause errors, although the position is less
than the limit in other cases.
Because we alw

Re: [proposal] de-TOAST'ing using a iterator

2019-07-25 Thread Binguo Bao
Hi John!
Sorry for the late reply. It took me some time to fix a random bug.

In the case where we don't know the slice size, how about the other
> aspect of my question above: Might it be simpler and less overhead to
> decompress entire chunks at a time? If so, I think it would be
> enlightening to compare performance.


Good idea. I've tested  your propopal with scripts and patch v5 in the
attachment:

  master  patch v4  patch v5
comp. beg.4364ms  1505ms  1529ms
comp. end   28321ms31202ms  26916ms
uncomp. beg. 3474ms  1513ms  1523ms
uncomp. end 27416ms30260ms  25888ms

The proposal improves suffix query performance greatly
with less calls to the decompression function.

Besides, do you have any other suggestions for the structure of
DetoastIterator or ToastBuffer?
Maybe they can be designed to be more reasonable.

Thanks again for the proposal.
-- 
Best regards,
Binguo Bao


init-test.sh
Description: application/shellscript


iterator-test.sh
Description: application/shellscript
From 19deb6d7609a156cb14571d83c27d72f3ecb5aa8 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 498 +++
 src/backend/utils/adt/varlena.c  |  49 ++--
 src/include/access/tuptoaster.h  |  92 +++
 3 files changed, 623 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..fb52bfb 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static int32 fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static int32 pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	 DetoastIterator iter, int32 length);
 
 
 /* --
@@ -347,6 +354,147 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * --
+ */
+DetoastIterator create_detoast_iterator(struct varlena *attr) {
+	struct varatt_external toast_pointer;
+	DetoastIterator iterator = NULL;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		/*
+		 * This is an externally stored datum --- create fetch datum iterator
+		 */
+		iterator = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			/* If it's compressed, prepare buffer for raw data */
+			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = true;
+			iterator->done = false;
+		}
+		else
+		{
+			iterator->buf = iterator->fetch_datum_iterator->buf;
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = false;
+			iterator->done = false;
+		}
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * This is an indirect pointer --- dereference it
+		 */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		iterator = create_detoast_iterator(attr);
+
+	}
+	else if (VARATT_IS_COMPRESSED(attr))
+	{
+		/*
+		 * This is a compressed value inside of the main tuple
+		 */
+		iterator = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iterator->fetch_datum_iterator = NULL;
+		iterator->source = palloc0(sizeof(ToastBuffer));
+		iterator->source->buf = (const char*) attr;
+		iterator->source->position = TOAST_COMPRESS_RAWDATA(attr);
+		iterator->source->limit = (char *)attr + VARSIZE(attr);
+		iterator->source->capacity = iterator->source->limit;
+
+		iterator->buf = palloc0(sizeof(ToastBuffer));
+		init_toast_buffer(iterator->buf, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ, false);
+
+		iterator->ctrlc = 0;
+		iterator->compressed = true;
+		iterator->done = false;
+	}
+
+	return iterator;
+}
+
+
+/* --
+ * free_detoast

[GSoC] Second period status report for the de-TOAST iterator

2019-07-23 Thread Binguo Bao
Hi hackers,
It's a report about my GSoC project[1] during the second period.

What I've done:
- Submitted the patch[2] to optimize partial TOAST decompression
- Submitted the patch[3] as a prototype of de-TOAST iterator and tested its
performance on position() function
- Perfected patches according to hacker's review comments

What I'm doing:
- Make overall tests on the iterator
- Applying the de-TOAST iterator to the json/jsonb data type

Thanks to hackers and mentors for helping me advance this project.

-- 
Best regards,
Binguo Bao

[1] https://summerofcode.withgoogle.com/projects/#6467011507388416
[2]
https://www.postgresql.org/message-id/flat/CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hxquftxkv9qpng73d5n...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/flat/cal-ogks_onzpc9m9bxpcztmofwulcfkyecekiagxzwrl8kx...@mail.gmail.com


Re: [proposal] de-TOAST'ing using a iterator

2019-07-16 Thread Binguo Bao
Hi, John

First, I'd like to advocate for caution when using synthetic
> benchmarks involving compression. Consider this test:
> insert into detoast_c (a)
> select
> 'abc'||
> repeat(
> (SELECT string_agg(md5(chr(i)), '')
> FROM generate_series(1,127) i)
> , 1)
> ||'xyz'
> from generate_series(1,100);
> The results for the uncompressed case were not much different then
> your test. However, in the compressed case the iterator doesn't buy us
> much with beginning searches since full decompression is already fast:
>  master  patch
> comp. beg.869ms  837ms
> comp. end   14100ms16100ms
> uncomp. beg. 6360ms  800ms
> uncomp. end 21100ms21400ms
> and with compression it's 14% slower searching to the end. This is
> pretty contrived, but I include it for demonstration.


I've reproduced the test case with test scripts in the attachment on my
laptop:

 master  patch
comp. beg.2686.77 ms  1532.79 ms
comp. end 17971.8 ms  21206.3 ms
uncomp. beg.8358.79 ms  1556.93 ms
uncomp. end 23559.7 ms  22547.1 ms

In the compressed beginning case, the test result is different from yours
since the patch is ~1.75x faster
rather than no improvement. The interesting thing is that the patch if 4%
faster than master in the uncompressed end case.
I can't figure out reason now.

Reading the thread where you're working on optimizing partial
> decompression [1], it seems you have two separate solutions for the
> two problems. Maybe this is fine, but I'd like to bring up the
> possibility of using the same approach for both kinds of callers.



> I'm not an expert on TOAST, but maybe one way to solve both problems
> is to work at the level of whole TOAST chunks. In that case, the
> current patch would look like this:
> 1. The caller requests more of the attribute value from the de-TOAST
> iterator.
> 2. The iterator gets the next chunk and either copies or decompresses
> the whole chunk into the buffer. (If inline, just decompress the whole
> thing)


Thanks for your suggestion. It is indeed possible to implement
PG_DETOAST_DATUM_SLICE using the de-TOAST iterator.
IMO the iterator is more suitable for situations where the caller doesn't
know the slice size. If the caller knows the slice size,
it is reasonable to fetch enough chunks at once and then decompress it at
once.
 --
Best regards,
Binguo Bao


init-test.sh
Description: application/shellscript


iterator-test.sh
Description: application/shellscript


Re: [proposal] de-TOAST'ing using a iterator

2019-07-11 Thread Binguo Bao
I have set the local build configuration to be the same as on the CI. This
patch should be correct.

Best regards,
Binguo Bao

Binguo Bao  于2019年7月11日周四 上午12:39写道:

> This is the patch that fix warnings.
>
> Best Regards,
> Binguo Bao
>
> Binguo Bao  于2019年7月10日周三 下午10:18写道:
>
>> Hi Thomas,
>> I've fixed the warnings.
>>
>> Thomas Munro  于2019年7月5日周五 下午12:21写道:
>>
>>> On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao  wrote:
>>> > Hi hackers!
>>> > This proposal aims to provide the ability to de-TOAST a fully TOAST'd
>>> and compressed field using an iterator and then update the appropriate
>>> parts of the code to use the iterator where possible instead of
>>> de-TOAST'ing and de-compressing the entire value. Examples where this can
>>> be helpful include using position() from the beginning of the value, or
>>> doing a pattern or substring match.
>>> >
>>> > de-TOAST iterator overview:
>>> > 1. The caller requests the slice of the attribute value from the
>>> de-TOAST iterator.
>>> > 2. The de-TOAST iterator checks if there is a slice available in the
>>> output buffer, if there is, return the result directly,
>>> > otherwise goto the step3.
>>> > 3. The de-TOAST iterator checks if there is the slice available in the
>>> input buffer, if there is, goto step44. Otherwise,
>>> > call fetch_datum_iterator to fetch datums from disk to input
>>> buffer.
>>> > 4. If the data in the input buffer is compressed, extract some data
>>> from the input buffer to the output buffer until the caller's
>>> > needs are met.
>>> >
>>> > I've implemented the prototype and apply it to the position() function
>>> to test performance.
>>>
>>> Hi Binguo,
>>>
>>> Interesting work, and nice performance improvements so far.  Just by
>>> the way, the patch currently generates warnings:
>>>
>>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719
>>>
>>> --
>>> Thomas Munro
>>> https://enterprisedb.com
>>>
>>
From 556e7945ee6e9c6ceba723a2fc5c5f1d60a9f412 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 491 +++
 src/backend/utils/adt/varlena.c  |  49 ++--
 src/include/access/tuptoaster.h  |  92 +++
 3 files changed, 616 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..8f7faf6 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static int32 fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static int32 pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	 DetoastIterator iter, int32 length);
 
 
 /* --
@@ -347,6 +354,145 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * --
+ */
+DetoastIterator create_detoast_iterator(struct varlena *attr) {
+	struct varatt_external toast_pointer;
+	DetoastIterator iterator = NULL;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		/*
+		 * This is an externally stored datum --- create fetch datum iterator
+		 */
+		iterator = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			/* If it's compressed, prepare buffer for raw data */
+			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = true;
+			iterator->done = false;
+		}
+		else
+		{
+			iterator->buf = iterator->fetch_datum_iterator->buf;
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = false;
+			iterator->done = false;
+		}
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * This is

Re: [proposal] de-TOAST'ing using a iterator

2019-07-10 Thread Binguo Bao
This is the patch that fix warnings.

Best Regards,
Binguo Bao

Binguo Bao  于2019年7月10日周三 下午10:18写道:

> Hi Thomas,
> I've fixed the warnings.
>
> Thomas Munro  于2019年7月5日周五 下午12:21写道:
>
>> On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao  wrote:
>> > Hi hackers!
>> > This proposal aims to provide the ability to de-TOAST a fully TOAST'd
>> and compressed field using an iterator and then update the appropriate
>> parts of the code to use the iterator where possible instead of
>> de-TOAST'ing and de-compressing the entire value. Examples where this can
>> be helpful include using position() from the beginning of the value, or
>> doing a pattern or substring match.
>> >
>> > de-TOAST iterator overview:
>> > 1. The caller requests the slice of the attribute value from the
>> de-TOAST iterator.
>> > 2. The de-TOAST iterator checks if there is a slice available in the
>> output buffer, if there is, return the result directly,
>> > otherwise goto the step3.
>> > 3. The de-TOAST iterator checks if there is the slice available in the
>> input buffer, if there is, goto step44. Otherwise,
>> > call fetch_datum_iterator to fetch datums from disk to input buffer.
>> > 4. If the data in the input buffer is compressed, extract some data
>> from the input buffer to the output buffer until the caller's
>> > needs are met.
>> >
>> > I've implemented the prototype and apply it to the position() function
>> to test performance.
>>
>> Hi Binguo,
>>
>> Interesting work, and nice performance improvements so far.  Just by
>> the way, the patch currently generates warnings:
>>
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719
>>
>> --
>> Thomas Munro
>> https://enterprisedb.com
>>
>
From ea2eb65049e2da984f0a2c3475f7f4f92adc98ab Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 491 +++
 src/backend/utils/adt/varlena.c  |  48 ++--
 src/include/access/tuptoaster.h  |  92 +++
 3 files changed, 615 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..8f7faf6 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static int32 fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static int32 pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	 DetoastIterator iter, int32 length);
 
 
 /* --
@@ -347,6 +354,145 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * --
+ */
+DetoastIterator create_detoast_iterator(struct varlena *attr) {
+	struct varatt_external toast_pointer;
+	DetoastIterator iterator = NULL;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		/*
+		 * This is an externally stored datum --- create fetch datum iterator
+		 */
+		iterator = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			/* If it's compressed, prepare buffer for raw data */
+			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = true;
+			iterator->done = false;
+		}
+		else
+		{
+			iterator->buf = iterator->fetch_datum_iterator->buf;
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = false;
+			iterator->done = false;
+		}
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * This is an indirect pointer --- dereference it
+		 */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		iterator = create_detoast_iterator(

Re: [proposal] de-TOAST'ing using a iterator

2019-07-10 Thread Binguo Bao
Hi Thomas,
I've fixed the warnings.

Thomas Munro  于2019年7月5日周五 下午12:21写道:

> On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao  wrote:
> > Hi hackers!
> > This proposal aims to provide the ability to de-TOAST a fully TOAST'd
> and compressed field using an iterator and then update the appropriate
> parts of the code to use the iterator where possible instead of
> de-TOAST'ing and de-compressing the entire value. Examples where this can
> be helpful include using position() from the beginning of the value, or
> doing a pattern or substring match.
> >
> > de-TOAST iterator overview:
> > 1. The caller requests the slice of the attribute value from the
> de-TOAST iterator.
> > 2. The de-TOAST iterator checks if there is a slice available in the
> output buffer, if there is, return the result directly,
> > otherwise goto the step3.
> > 3. The de-TOAST iterator checks if there is the slice available in the
> input buffer, if there is, goto step44. Otherwise,
> > call fetch_datum_iterator to fetch datums from disk to input buffer.
> > 4. If the data in the input buffer is compressed, extract some data from
> the input buffer to the output buffer until the caller's
> > needs are met.
> >
> > I've implemented the prototype and apply it to the position() function
> to test performance.
>
> Hi Binguo,
>
> Interesting work, and nice performance improvements so far.  Just by
> the way, the patch currently generates warnings:
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719
>
> --
> Thomas Munro
> https://enterprisedb.com
>
From 0ac012867be04b409bfc4e67306fd9f3e5b4785a Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 489 +++
 src/backend/utils/adt/varlena.c  |  48 ++--
 src/include/access/tuptoaster.h  |  92 +++
 3 files changed, 613 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..07bb69e 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static int32 fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static int32 pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	 DetoastIterator iter, int32 length);
 
 
 /* --
@@ -347,6 +354,145 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * --
+ */
+DetoastIterator create_detoast_iterator(struct varlena *attr) {
+	struct varatt_external toast_pointer;
+	DetoastIterator iterator = NULL;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		/*
+		 * This is an externally stored datum --- create fetch datum iterator
+		 */
+		iterator = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			/* If it's compressed, prepare buffer for raw data */
+			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = true;
+			iterator->done = false;
+		}
+		else
+		{
+			iterator->buf = iterator->fetch_datum_iterator->buf;
+			iterator->source = NULL;
+			iterator->ctrlc = 0;
+			iterator->compressed = false;
+			iterator->done = false;
+		}
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * This is an indirect pointer --- dereference it
+		 */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		iterator = create_detoast_iterator(attr);
+
+	}
+	else if (VARATT_IS_COMPRESSED(attr))
+	{
+		/*
+		 * This is a compressed value inside of the main tuple
+		 */
+		iterator = (DetoastIterator) palloc0(sizeof(DetoastIteratorData));
+		iterator->fetch_datum_iterator = NULL;
+		iterator->source = 

Re: Optimize partial TOAST decompression

2019-07-09 Thread Binguo Bao
Tomas Vondra  于2019年7月10日周三 上午5:12写道:

> On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:
> >On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:
> >>Hi, Tomas!
> >>Thanks for your testing and the suggestion.
> >>
> >>That's quite bizarre behavior - it does work with a prefix, but not with
> >>>suffix. And the exact ERROR changes after the prefix query.
> >>
> >>
> >>I think bug is caused by "#2  0x004c3b08 in
> >>heap_tuple_untoast_attr_slice (attr=, sliceoffset=0,
> >>slicelength=-1) at tuptoaster.c:315",
> >>since I ignore the case where slicelength is negative, and I've appended
> >>some comments for heap_tuple_untoast_attr_slice for the case.
> >>
> >>FWIW the new code added to this
> >>>function does not adhere to our code style, and would deserve some
> >>>additional explanation of what it's doing/why. Same for the
> >>>heap_tuple_untoast_attr_slice, BTW.
> >>
> >>
> >>I've added more comments to explain the code's behavior.
> >>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
> >>"TOAST_COMPRESS_DATA" since
> >>it is used to get toast compressed data rather than raw data.
> >>
> >
> >Thanks, this seems to address the issue - I can no longer reproduce the
> >crashes, allowing the benchmark to complete. I'm attaching the script I
> >used and spreadsheet with a summary of results.
> >
> >For the cases I've tested (100k - 10M values, different compressibility,
> >different prefix/length values), the results are kinda mixed - many
> >cases got much faster (~2x), but other cases got slower too. We're
> >however talking about queries taking a couple of miliseconds, so in
> >absolute numbers the differences are small.
> >
> >That does not mean the optimization is useless - but the example shared
> >at the beginning of this thread is quite extreme, as the values are
> >extremely compressible. Each value is ~38MB (in plaintext), but a table
> >with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
> >which I think is not typical for real-world data sets.
> >
> >The data I've used are less extreme, depending on the fraction of random
> >data in values.
> >
> >I went through the code too. I've reworder a couple of comments and code
> >style issues, but there are a couple of more serious issues.
> >
> >
> >1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?
> >
> >This seems unnecessary, and it discards the clear hint that it's about
> >accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
> >IMHO we should keep the original naming.
> >
> >
> >2) pglz_maximum_compressed_size signatures are confusing
> >
> >There are two places with pglz_maximum_compressed_size signature, and
> >those places are kinda out of sync when it comes to parameter names:
> >
> > int32
> > pglz_maximum_compressed_size(int32 raw_slice_size,
> >  int32 total_compressed_size)
> >
> > extern
> > int32 pglz_maximum_compressed_size(int32 raw_slice_size,
> >int32 raw_size);
> >
> >Also, pg_lzcompress.c has no concept of a "slice" because it only deals
> >with simple compression, slicing is responsibility of the tuptoaster. So
> >we should not mix those two, not even in comments.
> >
> >
> >I propose tweaks per the attached patch - I think it makes the code
> >clearer, and it's mostly cosmetic stuff. But I haven't tested the
> >changes beyond "it compiles".
> >
> >
> >regards
> >
>
> FWIW I've done another round of tests with larger values (up to ~10MB)
> and the larger the values the better the speedup (a bit as expected).
> Attached is the script I used and a spreasheet with a summary.
>
> We still need a patch addressing the review comments, though.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Hi, Tomas!
Sorry for the late reply.
Thank you for further testing, I am trying to reproduce your first test
summary,
since I think the performance of the patch will not drop in almost all
cases.

Besides, If a value is composed of random characters,
it will be hard to be compressed, because pglz requires a 25% compression
ratio by default or not worth it.
This means that querying the value will not trigger the patch. But the
first test resu

Re: Optimize partial TOAST decompression

2019-07-05 Thread Binguo Bao
Tomas Vondra  于2019年7月5日周五 上午1:46写道:

> I've done a bit of testing and benchmaring on this patch today, and
> there's a bug somewhere, making it look like there are corrupted data.
>
> What I'm seeing is this:
>
> CREATE TABLE t (a text);
>
> -- attached is data for one row
> COPY t FROM '/tmp/t.data';
>
>
> SELECT length(substr(a,1000)) from t;
> psql: ERROR:  compressed data is corrupted
>
> SELECT length(substr(a,0,1000)) from t;
>  length
> 
> 999
> (1 row)
>
> SELECT length(substr(a,1000)) from t;
> psql: ERROR:  invalid memory alloc request size 2018785106
>
> That's quite bizarre behavior - it does work with a prefix, but not with
> suffix. And the exact ERROR changes after the prefix query. (Of course,
> on master it works in all cases.)
>
> The backtrace (with the patch applied) looks like this:
>
> #0  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
> #1  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
> #2  0x004c3b08 in heap_tuple_untoast_attr_slice (attr= out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315
> #3  0x0085c1e5 in pg_detoast_datum_slice (datum=,
> first=, count=) at fmgr.c:1767
> #4  0x00833b7a in text_substring (str=133761519127512, start=0,
> length=, length_not_specified=) at
> varlena.c:956
> ...
>
> I've only observed this with a very small number of rows (the  data is
> generated randomly with different compressibility etc.), so I'm only
> attaching one row that exhibits this issue.
>
> My guess is toast_fetch_datum_slice() gets confused by the headers or
> something, or something like that. FWIW the new code added to this
> function does not adhere to our code style, and would deserve some
> additional explanation of what it's doing/why. Same for the
> heap_tuple_untoast_attr_slice, BTW.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with
> suffix. And the exact ERROR changes after the prefix query.


I think bug is caused by "#2  0x004c3b08 in
heap_tuple_untoast_attr_slice (attr=, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this
> function does not adhere to our code style, and would deserve some
> additional explanation of what it's doing/why. Same for the
> heap_tuple_untoast_attr_slice, BTW.


I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Best Regards, Binguo Bao.
From bcf04278b4d5956cd5f5fdab0d954b36adfd0022 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 57 
 src/common/pg_lzcompress.c   | 26 
 src/include/common/pg_lzcompress.h   |  1 +
 3 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..1eb6cca 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -61,7 +61,8 @@ typedef struct toast_compress_header
  */
 #define TOAST_COMPRESS_HDRSZ		((int32) sizeof(toast_compress_header))
 #define TOAST_COMPRESS_RAWSIZE(ptr) (((toast_compress_header *) (ptr))->rawsize)
-#define TOAST_COMPRESS_RAWDATA(ptr) \
+#define TOAST_COMPRESS_SIZE(ptr)	((int32) VARSIZE(ptr) - TOAST_COMPRESS_HDRSZ)
+#define TOAST_COMPRESS_DATA(ptr) \
 	(((char *) (ptr)) + TOAST_COMPRESS_HDRSZ)
 #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
 	(((toast_compress_header *) (ptr))->rawsize = (len))
@@ -252,6 +253,8 @@ heap_tuple_untoast_attr(struct varlena *attr)
  *
  *		Public entry point to get back part of a toasted value
  *		from compression or external storage.
+ *
+ * Note if slicelength is negative, return suffix of the value.
  * --
  */
 struct varlena *
@@ -273,8 +276,23 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		/*
+		 * If only need the prefix slice, fetch enough datums to decompress.
+		 * Otherwise, fetch all datums.
+		 */
+		if (slicelength > 0 && sliceoffset 

Re: Optimize partial TOAST decompression

2019-07-03 Thread Binguo Bao
Paul Ramsey  于2019年7月2日周二 下午10:46写道:

> This looks good to me. A little commentary around why
> pglz_maximum_compressed_size() returns a universally correct answer
> (there's no way the compressed size can ever be larger than this
> because...) would be nice for peasants like myself.
>
> If you're looking to continue down this code line in your next patch,
> the next TODO item is a little more involved: a user-land (ala
> PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow
> the optimization of searching of large objects like JSONB types, and
> so on, where the thing you are looking for is not at a known location
> in the object. So, things like looking for a particular substring in a
> string, or looking for a particular key in a JSONB. "Iterate until you
> find the thing." would allow optimization of some code lines that
> currently require full decompression of the objects.
>
> P.
>

Thanks for your comment. I've updated the patch.
As for the iterator API, I've implemented a de-TOAST iterator actually[0].
And I’m looking for more of its application scenarios and perfecting it.
Any comments would be much appreciated.

Best Regards, Binguo Bao.

[0]
https://www.postgresql.org/message-id/flat/cal-ogks_onzpc9m9bxpcztmofwulcfkyecekiagxzwrl8kx...@mail.gmail.com
From 2e4e2838937ec6fa1404fe529e7ed303e391d1b2 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 24 +---
 src/common/pg_lzcompress.c   | 26 ++
 src/include/common/pg_lzcompress.h   |  1 +
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..684f1b2 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -266,6 +266,7 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
+		int32 max_size;
 
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
@@ -273,8 +274,13 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
+toast_pointer.va_rawsize);
+		/*
+		 * Be sure to get enough compressed slice
+		 * and compressed marker will get set automatically
+		 */
+		preslice = toast_fetch_datum_slice(attr, 0, max_size);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
 	{
@@ -2031,7 +2037,8 @@ toast_fetch_datum(struct varlena *attr)
  *	Reconstruct a segment of a Datum from the chunks saved
  *	in the toast relation
  *
- *	Note that this function only supports non-compressed external datums.
+ *	Note that this function supports non-compressed external datums
+ *	and compressed external datum slices at the start of the object.
  * --
  */
 static struct varlena *
@@ -2072,10 +2079,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
 	/*
-	 * It's nonsense to fetch slices of a compressed datum -- this isn't lo_*
-	 * we can't return a compressed datum which is meaningful to toast later
+	 * It's meaningful to fetch slices at the start of a compressed datum.
 	 */
-	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
+	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
 
 	attrsize = toast_pointer.va_extsize;
 	totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
@@ -2091,7 +2097,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 
 	result = (struct varlena *) palloc(length + VARHDRSZ);
 
-	SET_VARSIZE(result, length + VARHDRSZ);
+	if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) {
+		SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
+	} else {
+		SET_VARSIZE(result, length + VARHDRSZ);
+	}
 
 	if (length == 0)
 		return result;			/* Can save a lot of work at this point! */
diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
index 988b398..80ed17a 100644
--- a/src/common/pg_lzcompress.c
+++ b/src/common/pg_lzcompress.c
@@ -771,3 +771,29 @@ pglz_decompress(const char *source, int32 slen, char *dest,
 	 */
 	return (char *) dp - dest;
 }
+
+
+
+/* --
+ * pglz_max_compressed_size -
+ *
+ * 		Calculate the maximum size of the compressed slice corresponding to the
+ * 		raw slice. Return the maximum size, or raw size if maximum size is larger
+ * 		than raw size.
+ * --
+ */
+int32
+pglz_maximum_compressed_size(int32 raw_slice_size, int32 raw_size)
+{
+	int32 result;
+
+	/*
+	 * Use int64 to

Re: Optimize partial TOAST decompression

2019-07-01 Thread Binguo Bao
Hi!

> Andrey Borodin  于2019年6月29日周六 下午9:48写道:

> Hi!
> Please, do not use top-posting, i.e. reply style where you quote whole
> message under your response. It makes reading of archives terse.
>
> > 24 июня 2019 г., в 7:53, Binguo Bao  написал(а):
> >
> >> This is not correct: L bytes of compressed data do not always can be
> decoded into at least L bytes of data. At worst we have one control byte
> per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8
> bytes with current pglz format.
> >
> > Good catch! I've corrected the related code in the patch.
> > ...
> > <0001-Optimize-partial-TOAST-decompression-2.patch>
>
> I've took a look into the code.
> I think we should extract function for computation of max_compressed_size
> and put it somewhere along with pglz code. Just in case something will
> change something about pglz so that they would not forget about compression
> algorithm assumption.
>
> Also I suggest just using 64 bit computation to avoid overflows. And I
> think it worth to check if max_compressed_size is whole data and use min of
> (max_compressed_size, uncompressed_data_size).
>
> Also you declared needsize and max_compressed_size too far from use. But
> this will be solved by function extraction anyway.
>
> Thanks!
>
> Best regards, Andrey Borodin.


Thanks for the suggestion.
I've extracted function for computation for max_compressed_size and put the
function into pg_lzcompress.c.

Best regards, Binguo Bao.
From 79a1b4c292a0629df9d7ba3dc04e879aadca7a61 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 24 +---
 src/common/pg_lzcompress.c   | 22 ++
 src/include/common/pg_lzcompress.h   |  1 +
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..684f1b2 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -266,6 +266,7 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
+		int32 max_size;
 
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
@@ -273,8 +274,13 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
+toast_pointer.va_rawsize);
+		/*
+		 * Be sure to get enough compressed slice
+		 * and compressed marker will get set automatically
+		 */
+		preslice = toast_fetch_datum_slice(attr, 0, max_size);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
 	{
@@ -2031,7 +2037,8 @@ toast_fetch_datum(struct varlena *attr)
  *	Reconstruct a segment of a Datum from the chunks saved
  *	in the toast relation
  *
- *	Note that this function only supports non-compressed external datums.
+ *	Note that this function supports non-compressed external datums
+ *	and compressed external datum slices at the start of the object.
  * --
  */
 static struct varlena *
@@ -2072,10 +2079,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
 	/*
-	 * It's nonsense to fetch slices of a compressed datum -- this isn't lo_*
-	 * we can't return a compressed datum which is meaningful to toast later
+	 * It's meaningful to fetch slices at the start of a compressed datum.
 	 */
-	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
+	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
 
 	attrsize = toast_pointer.va_extsize;
 	totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
@@ -2091,7 +2097,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 
 	result = (struct varlena *) palloc(length + VARHDRSZ);
 
-	SET_VARSIZE(result, length + VARHDRSZ);
+	if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) {
+		SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
+	} else {
+		SET_VARSIZE(result, length + VARHDRSZ);
+	}
 
 	if (length == 0)
 		return result;			/* Can save a lot of work at this point! */
diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
index 988b398..2b5f112 100644
--- a/src/common/pg_lzcompress.c
+++ b/src/common/pg_lzcompress.c
@@ -771,3 +771,25 @@ pglz_decompress(const char *source, int32 slen, char *dest,
 	 */
 	return (char *) dp - dest;
 }
+
+
+
+/* --
+ * pglz_max_compressed_size -
+ *
+ * 		Calculate the maximum size of the compressed slice corresponding to the
+ 

Re: Optimize partial TOAST decompression

2019-06-23 Thread Binguo Bao
> This is not correct: L bytes of compressed data do not always can be
decoded into at least L bytes of data. At worst we have one control byte
per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8
bytes with current pglz format.

Good catch! I've corrected the related code in the patch.

> Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...
I followed the code in toast_fetch_datum function[1], and I didn't see any
wrong with it.

Best regards, Binguo Bao

[1]
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/tuptoaster.c#L1898

Andrey Borodin  于2019年6月23日周日 下午5:23写道:

> Hi, Binguo!
>
> > 2 июня 2019 г., в 19:48, Binguo Bao  написал(а):
> >
> > Hi, hackers!
> 
> > This seems to have a 10x improvement. If the number of toast data chunks
> is more, I believe that patch can play a greater role, there are about 200
> related TOAST data chunks for each entry in the case.
>
> That's really cool that you could produce meaningful patch long before end
> of GSoC!
>
> I'll describe what is going on a little:
> 1. We have compressed value, which resides in TOAST table.
> 2. We want only some fraction of this value. We want some prefix with
> length L.
> 3. Previously Paul Ramsey submitted patch that omits decompression of
> value beyond desired L bytes.
> 4. Binguo's patch tries to do not fetch compressed data which will not bee
> needed to decompressor. In fact it fetches L bytes from TOAST table.
>
> This is not correct: L bytes of compressed data do not always can be
> decoded into at least L bytes of data. At worst we have one control byte
> per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8
> bytes with current pglz format.
>
> Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...
>
> Best regards, Andrey Borodin.
From 505dcc4fdef18710c98718685180190056b4d9ed Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..89ffc2d 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -262,6 +262,8 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 	struct varlena *result;
 	char	   *attrdata;
 	int32		attrsize;
+	int32 		needsize;
+	int32 		max_compressed_size;
 
 	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
@@ -273,8 +275,22 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		needsize = sliceoffset + slicelength;
+		if (needsize > (INT_MAX / 9))
+		{
+			/* Approximate to avoid overflow */
+			max_compressed_size = (needsize / 8 + 1) * 9;
+		}
+		else
+		{
+			max_compressed_size = (needsize * 9) / 8 + 1;
+		}
+
+		/*
+		 * Be sure to get enough compressed slice
+		 * and compressed marker will get set automatically
+		 */
+		preslice = toast_fetch_datum_slice(attr, 0, max_compressed_size);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
 	{
@@ -2031,7 +2047,8 @@ toast_fetch_datum(struct varlena *attr)
  *	Reconstruct a segment of a Datum from the chunks saved
  *	in the toast relation
  *
- *	Note that this function only supports non-compressed external datums.
+ *	Note that this function supports non-compressed external datums
+ *	and compressed external datum slices at the start of the object.
  * --
  */
 static struct varlena *
@@ -2072,10 +2089,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
 	/*
-	 * It's nonsense to fetch slices of a compressed datum -- this isn't lo_*
-	 * we can't return a compressed datum which is meaningful to toast later
+	 * It's meaningful to fetch slices at the start of a compressed datum.
 	 */
-	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
+	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
 
 	attrsize = toast_pointer.va_extsize;
 	totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
@@ -2091,7 +2107,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 
 	result = (struct varlena *) palloc(length + VARHDRSZ);
 
-	SET_VARSIZE(result, length + VARHDRSZ);
+	if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) {
+		SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
+	} else {
+		SET_VARSIZE(result, length + VARHDRSZ);
+	}
 
 	if (length == 0)
 		return result;			/* Can save a lot of work at this point! */
-- 
2.7.4



[proposal] de-TOAST'ing using a iterator

2019-06-19 Thread Binguo Bao
Hi hackers!
This proposal aims to provide the ability to de-TOAST a fully TOAST'd and
compressed field using an iterator and then update the appropriate parts of
the code to use the iterator where possible instead of de-TOAST'ing and
de-compressing the entire value. Examples where this can be helpful include
using position() from the beginning of the value, or doing a pattern or
substring match.

de-TOAST iterator overview:
1. The caller requests the slice of the attribute value from the de-TOAST
iterator.
2. The de-TOAST iterator checks if there is a slice available in the output
buffer, if there is, return the result directly,
otherwise goto the step3.
3. The de-TOAST iterator checks if there is the slice available in the
input buffer, if there is, goto step44. Otherwise,
call fetch_datum_iterator to fetch datums from disk to input buffer.
4. If the data in the input buffer is compressed, extract some data from
the input buffer to the output buffer until the caller's
needs are met.

I've implemented the prototype and apply it to the position() function to
test performance.
Test tables:
-
create table detoast_c (id serial primary key,
a text
);
insert into detoast_c (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100)||'321' as a from
generate_series(1,100);

create table detoast_u (id serial primary key,
a text
);
alter table detoast_u alter a set storage external;
insert into detoast_u (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100)||'321' as a from
generate_series(1,100);
**
-
 query|
 master (ms)  |  patch  (ms)  |
-
select position('123' in a) from detoast_c;| 4054.838   |
1440.735   |
-
select position('321' in a) from detoast_c;| 25549.270 |
 27696.245  |
-
select position('123' in a) from detoast_u;| 8116.996   |
1386.802   |
-
select position('321' in a) from detoast_u | 28442.116 |
 27672.319  |
-
**
It can be seen that the iterator greatly improves the efficiency of partial
de-TOAST when it has almost no degradation in full de-TOAST efficiency.
Next, I will continue to study how to apply iterators to more queries
and improve iterator efficiency, such as using macros instead of function
calls.

The patch is also available on github[1].
Any suggestions or comments would be much appreciated:)

Best regards, Binguo Bao.

[1] https://github.com/djydewang/postgres/pull/1/files
From b071bf9801f45d5ff48422b44bd5042ae19ea20c Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 488 +++
 src/backend/utils/adt/varlena.c  |  48 ++--
 src/include/access/tuptoaster.h  |  92 +++
 3 files changed, 612 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..92dc87a 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static int32 fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static int32 pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	 DetoastIterator iter, int32 length);
 
 
 /* --
@@ -347,6 +354,145 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * --
+ */
+DetoastIterator create_detoast_iterator(struct varlena *attr) {
+	struct varatt_external toast_pointer;
+	DetoastIterator iterator = NULL

Optimize partial TOAST decompression

2019-06-02 Thread Binguo Bao
Hi, hackers!
I'm a student participating in GSoC 2019 and my project is related to TOAST
slices.
When I'm getting familiar with the postgresql codebase, I find that
PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch
all compressed data chunks then extract the relevant slice. Obviously, this
is unnecessary, we only need to fetch the data chunks we need.

The patch optimizes partial TOAST decompression.
For an example of the improvement possible, this trivial example:
-
create table slicingtest (
id serial primary key,
a text
);

insert into slicingtest (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100) as a from
generate_series(1,100);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;
-
environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux
On master, I get
Time: 28.123 ms (Take ten times average)
With the patch, I get
Time: 2.306 ms  (take ten times average)

This seems to have a 10x improvement. If the number of toast data chunks is
more, I believe that patch can play a greater role, there are about 200
related TOAST data chunks for each entry in the case.

Related discussion:
https://www.postgresql.org/message-id/flat/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com

Best regards, Binguo Bao.
From a7c99439ffe309526b57fe26ab367e4b7bf62f39 Mon Sep 17 00:00:00 2001
From: BBG 
Date: Sun, 2 Jun 2019 19:18:46 +0800
Subject: [PATCH] Optimize partial TOAST decompression

---
 src/backend/access/heap/tuptoaster.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..7d30538 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -273,8 +273,11 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 		if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
 			return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
 
-		/* fetch it back (compressed marker will get set automatically) */
-		preslice = toast_fetch_datum(attr);
+		/*
+		 * Be sure to get enough compressed slice
+		 * and compressed marker will get set automatically
+		 */
+		preslice = toast_fetch_datum_slice(attr, 0, sliceoffset + slicelength + 1);
 	}
 	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
 	{
@@ -2031,7 +2034,8 @@ toast_fetch_datum(struct varlena *attr)
  *	Reconstruct a segment of a Datum from the chunks saved
  *	in the toast relation
  *
- *	Note that this function only supports non-compressed external datums.
+ *	Note that this function supports non-compressed external datums
+ *	and compressed external datum slices at the start of the object.
  * --
  */
 static struct varlena *
@@ -2072,10 +2076,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 
 	/*
-	 * It's nonsense to fetch slices of a compressed datum -- this isn't lo_*
-	 * we can't return a compressed datum which is meaningful to toast later
+	 * It's meaningful to fetch slices at the start of a compressed datum.
 	 */
-	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));
+	Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset);
 
 	attrsize = toast_pointer.va_extsize;
 	totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
@@ -2091,7 +2094,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length)
 
 	result = (struct varlena *) palloc(length + VARHDRSZ);
 
-	SET_VARSIZE(result, length + VARHDRSZ);
+	if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) {
+		SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ);
+	} else {
+		SET_VARSIZE(result, length + VARHDRSZ);
+	}
 
 	if (length == 0)
 		return result;			/* Can save a lot of work at this point! */
-- 
2.7.4



Re: pglz performance

2019-05-23 Thread Binguo Bao
:  Decompressor pglz_decompress_hacked8result 14.438165
NOTICE:  Decompressor pglz_decompress_hacked16   result 15.716280
NOTICE:  Decompressor pglz_decompress_vanillaresult 21.034867
NOTICE:  Decompressor pglz_decompress_hacked_seg result 12.090609
NOTICE:

compressor score (summ of all times):
NOTICE:  compressor pglz_compress_vanilla result 276.776671
NOTICE:  compressor pglz_compress_hacked_seg result 222.407850

There are some questions now:
1. The compression algorithm is not compatible with the original
compression algorithm now.
2. If the idea works, we need to test more data, what kind of data is more
appropriate?
Any comments are much appreciated.

Best regards, Binguo Bao.

[0]
https://docs.google.com/document/d/1V4oXV5vGrGx24deBTKKM7bVdO3Cy-zfj-wQ4dkBUCl4/edit
[1] https://github.com/djydewang/test_pglz