Re: [HACKERS] extensible external toast tuple support

2013-07-02 Thread Andres Freund
On 2013-07-01 17:46:51 -0400, Robert Haas wrote:
 But backing up a minute, this is really a significant behavior change
 that is independent of the purpose of the rest of this patch.  What
 you're proposing here is that every time we consider toasting a value
 on update, we should first check whether it's byte-for-byte equivalent
 to the old value.  That may or may not be a good idea - it will suck
 if, for example, a user repeatedly updates a very long string by
 changing only the last character thereof, but it will win in other
 cases.

In that case the old value will rather likely just have been read just
before, so the price of rereading should be relatively low.

But:
  Whether it's a good idea or not, I think it deserves to be a
 separate patch.  For purposes of this patch, I think you should just
 assume that any external-indirect value needs to be retoasted, just as
 we currently assume for untoasted values.

is a rather valid point. I've split the patch accordingly. The second
patch is *not* supposed to be applied together with patch 1 but rather
included for reference.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 9f69c755c4e69495847138e3b7ce47dee78b6eb0 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH 1/2] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.

0.4
---
 src/backend/access/heap/tuptoaster.c | 110 ++---
 src/include/access/tuptoaster.h  |   5 +
 src/include/postgres.h   |  84 +
 src/test/regress/expected/indirect_toast.out | 151 +++
 src/test/regress/input/create_function_1.source  |   5 +
 src/test/regress/output/create_function_1.source |   4 +
 src/test/regress/parallel_schedule   |   2 +-
 src/test/regress/regress.c   |  92 ++
 src/test/regress/serial_schedule |   1 +
 src/test/regress/sql/indirect_toast.sql  |  61 +
 10 files changed, 472 insertions(+), 43 deletions(-)
 create mode 100644 src/test/regress/expected/indirect_toast.out
 create mode 100644 src/test/regress/sql/indirect_toast.sql

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..e759502 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -44,9 +44,6 @@
 
 #undef TOAST_DEBUG
 
-/* Size of an EXTERNAL datum that contains a standard TOAST pointer */
-#define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_external))
-
 /*
  * Testing whether an externally-stored value is compressed now requires
  * comparing extsize (the actual length of the external data) to rawsize
@@ -87,11 +84,11 @@ static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  * heap_tuple_fetch_attr -
  *
  *	Public entry point to get back a toasted value from
- *	external storage (possibly still in compressed format).
+ *	external source (possibly still in compressed format).
  *
  * This will return a datum that contains all the data internally, ie, not
- * relying on external storage, but it can still be compressed or have a short
- * header.
+ * relying on external storage or memory, but it can still be compressed or
+ * have a short header.
  --
  */
 struct varlena *
@@ -99,13 +96,35 @@ heap_tuple_fetch_attr(struct varlena * attr)
 {
 	struct varlena *result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an external stored plain value
 		 */
 		result = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * copy into the caller's memory context. That's not required in all
+		 * cases but sufficient for now since this is mainly used when we need
+		 * to persist a Datum for unusually long time, like in a HOLD cursor.
+		 */
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* doesn't make much sense, but better handle it */
+		if 

Re: [HACKERS] extensible external toast tuple support

2013-07-02 Thread Robert Haas
On Tue, Jul 2, 2013 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 In that case the old value will rather likely just have been read just
 before, so the price of rereading should be relatively low.

Maybe in terms of I/O, but there's still CPU.

 is a rather valid point. I've split the patch accordingly. The second
 patch is *not* supposed to be applied together with patch 1 but rather
 included for reference.

OK, committed patch 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-07-01 Thread Andres Freund
On 2013-06-28 11:25:50 -0400, Robert Haas wrote:
 On Fri, Jun 28, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Why does toast_insert_or_update() need to go through all the
  rigamarole in toast_datum_differs()?  I would have thought that it
  could simply treat any external-indirect value as needing to be
  detoasted and retoasted, since the destination is the disk anyhow.
 
  We could do that, yes. But I think it might be better not to: If we
  simplify the tuples used in a query to not reference ondisk tuples
  anymore and we then UPDATE using that new version I would rather not
  retoast all the unchanged columns.
 
  I can e.g. very well imagine that we decide to resolve toasted Datums to
  indirect Datums during an UPDATE if there are multiple BEFORE UPDATE
  triggers to avoid detoasting in each and every one of them. Such a tuple
  will then passed to heap_update...
 
 I must be missing something.  At that point, yes, you'd like to avoid
 re-toasting unnecessarily, but ISTM you've already bought the farm.
 Unless I'm misunderstanding the code as written, you'd just end up
 writing the indirect pointer back out to disk in that scenario.
 That's not gonna work...

You didn't misunderstand anything unfortunately. I broke that somewhere
along the road, probably when factoring things out to
toast_datum_differs(). Fixed...
Which shows that we need tests. I've added some using a function in
regress.c that makes all datums indirect. Together with a triggers that
allows some testing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 7b2c83aa337e639d2467359ab7b0cc00a2f36cde Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.

0.3
---
 src/backend/access/heap/tuptoaster.c | 189 ---
 src/include/access/tuptoaster.h  |   5 +
 src/include/postgres.h   |  84 +++---
 src/test/regress/expected/indirect_toast.out | 151 ++
 src/test/regress/input/create_function_1.source  |   5 +
 src/test/regress/output/create_function_1.source |   4 +
 src/test/regress/parallel_schedule   |   2 +-
 src/test/regress/regress.c   |  92 +++
 src/test/regress/serial_schedule |   1 +
 src/test/regress/sql/indirect_toast.sql  |  61 
 10 files changed, 552 insertions(+), 42 deletions(-)
 create mode 100644 src/test/regress/expected/indirect_toast.out
 create mode 100644 src/test/regress/sql/indirect_toast.sql

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..fd3362f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -44,9 +44,6 @@
 
 #undef TOAST_DEBUG
 
-/* Size of an EXTERNAL datum that contains a standard TOAST pointer */
-#define TOAST_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_external))
-
 /*
  * Testing whether an externally-stored value is compressed now requires
  * comparing extsize (the actual length of the external data) to rawsize
@@ -87,11 +84,11 @@ static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  * heap_tuple_fetch_attr -
  *
  *	Public entry point to get back a toasted value from
- *	external storage (possibly still in compressed format).
+ *	external source (possibly still in compressed format).
  *
  * This will return a datum that contains all the data internally, ie, not
- * relying on external storage, but it can still be compressed or have a short
- * header.
+ * relying on external storage or memory, but it can still be compressed or
+ * have a short header.
  --
  */
 struct varlena *
@@ -99,13 +96,35 @@ heap_tuple_fetch_attr(struct varlena * attr)
 {
 	struct varlena *result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an external stored plain value
 		 */
 		result = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * copy into the caller's memory context. That's not required in all
+		 * cases but sufficient for now since this 

Re: [HACKERS] extensible external toast tuple support

2013-07-01 Thread Robert Haas
On Mon, Jul 1, 2013 at 3:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 I must be missing something.  At that point, yes, you'd like to avoid
 re-toasting unnecessarily, but ISTM you've already bought the farm.
 Unless I'm misunderstanding the code as written, you'd just end up
 writing the indirect pointer back out to disk in that scenario.
 That's not gonna work...

 You didn't misunderstand anything unfortunately. I broke that somewhere
 along the road, probably when factoring things out to
 toast_datum_differs(). Fixed...
 Which shows that we need tests. I've added some using a function in
 regress.c that makes all datums indirect. Together with a triggers that
 allows some testing.

OK, that's nifty.

+   elog(ERROR, deleteing a tuple indirect datums 
doesn't make sense);

That's spelled wrong.

+   Assert(VARATT_IS_EXTERNAL_ONDISK(old_value));
+
+   /* fast path for the common case where we have the toast oid available 
*/
+   if (VARATT_IS_EXTERNAL_ONDISK(old_value) 
+   VARATT_IS_EXTERNAL_ONDISK(new_value))
+   return memcmp((char *) old_value, (char *) new_value,
+ VARSIZE_EXTERNAL(old_value)) != 0;

You've already asserted VARATT_IS_EXTERNAL_ONDISK(old_value), so there
can't be any need to test that condition again.

+   changed = memcmp(VARDATA_ANY(old_data), VARDATA_ANY(new_data),
+ VARSIZE_ANY_EXHDR(old_data)) != 0;
+   if (!changed)
+   *needs_rewrite = true;
+
+   /* heap_tuple_untoast_attr copied data */
+   pfree(old_data);
+   if (new_value != new_data)
+   pfree(new_data);
+   return changed;

So... basically, we always set needs_rewrite to !changed, except when
VARATT_IS_EXTERNAL_ONDISK(new_value).  In that latter case we return
after doing memcmp().  That's kind of byzantine and could probably be
simplified by pushing the first if-clause of this function up into the
caller and then making the function's job to simply compare the datums
on the assumption that the new datum is not external on-disk.  Then
you wouldn't need this funny three-way return value done as two
Booleans.

But backing up a minute, this is really a significant behavior change
that is independent of the purpose of the rest of this patch.  What
you're proposing here is that every time we consider toasting a value
on update, we should first check whether it's byte-for-byte equivalent
to the old value.  That may or may not be a good idea - it will suck
if, for example, a user repeatedly updates a very long string by
changing only the last character thereof, but it will win in other
cases.  Whether it's a good idea or not, I think it deserves to be a
separate patch.  For purposes of this patch, I think you should just
assume that any external-indirect value needs to be retoasted, just as
we currently assume for untoasted values.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-28 Thread Robert Haas
On Thu, Jun 27, 2013 at 12:56 PM, Andres Freund and...@2ndquadrant.com wrote:
 Please find attached the next version of the extensible toast
 support. There basically are two changes:

 * handle indirect toast tuples properly in heap_tuple_fetch_attr
   and related places
 * minor comment adjustments

It looks to me like you need to pass true, rather than false, as the
first argument to TrapMacro:

+#define VARTAG_SIZE(tag) \
+   ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) :   \
+(tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \
+TrapMacro(false, unknown vartag))

Still looking at the rest of this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-28 Thread Robert Haas
On Fri, Jun 28, 2013 at 9:23 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 27, 2013 at 12:56 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Please find attached the next version of the extensible toast
 support. There basically are two changes:

 * handle indirect toast tuples properly in heap_tuple_fetch_attr
   and related places
 * minor comment adjustments

 It looks to me like you need to pass true, rather than false, as the
 first argument to TrapMacro:

 +#define VARTAG_SIZE(tag) \
 +   ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) :   \
 +(tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \
 +TrapMacro(false, unknown vartag))

 Still looking at the rest of this.

Why does toast_insert_or_update() need to go through all the
rigamarole in toast_datum_differs()?  I would have thought that it
could simply treat any external-indirect value as needing to be
detoasted and retoasted, since the destination is the disk anyhow.

Do you see external-indirect values getting used for anything other
than logical replication?  Is there code to do so anywhere?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-28 Thread Andres Freund
On 2013-06-28 09:49:53 -0400, Robert Haas wrote:
 On Fri, Jun 28, 2013 at 9:23 AM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Jun 27, 2013 at 12:56 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  Please find attached the next version of the extensible toast
  support. There basically are two changes:
 
  * handle indirect toast tuples properly in heap_tuple_fetch_attr
and related places
  * minor comment adjustments
 
  It looks to me like you need to pass true, rather than false, as the
  first argument to TrapMacro:
 
  +#define VARTAG_SIZE(tag) \
  +   ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) :   \
  +(tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \
  +TrapMacro(false, unknown vartag))

Heh. I was obviously trying to be too cute ;)

Good idea  thanks for committing it separately.

 Do you see external-indirect values getting used for anything other
 than logical replication?  Is there code to do so anywhere?

Yes and not really. I think we can reuse it to avoid repeated detoastings, even
in relatively normal queries. What I am absolutely not sure yet is how
to automatically decide when we want to keep the tuple in memory because
it will be beneficial, and when not because of the obvious memory
overhead. And how to keep track of the memory context used to allocate
the referenced data.
There are some usecases where I think it might be relatively easy to
decide its a win. E.g tuples passed to triggers if there are
multiple ones of them.

I've played a bit with using that functionality in C functions, but
nothing publishable, more to make sure things work.

 Why does toast_insert_or_update() need to go through all the
 rigamarole in toast_datum_differs()?  I would have thought that it
 could simply treat any external-indirect value as needing to be
 detoasted and retoasted, since the destination is the disk anyhow.

We could do that, yes. But I think it might be better not to: If we
simplify the tuples used in a query to not reference ondisk tuples
anymore and we then UPDATE using that new version I would rather not
retoast all the unchanged columns.

I can e.g. very well imagine that we decide to resolve toasted Datums to
indirect Datums during an UPDATE if there are multiple BEFORE UPDATE
triggers to avoid detoasting in each and every one of them. Such a tuple
will then passed to heap_update...


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-28 Thread Robert Haas
On Fri, Jun 28, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote:
 Why does toast_insert_or_update() need to go through all the
 rigamarole in toast_datum_differs()?  I would have thought that it
 could simply treat any external-indirect value as needing to be
 detoasted and retoasted, since the destination is the disk anyhow.

 We could do that, yes. But I think it might be better not to: If we
 simplify the tuples used in a query to not reference ondisk tuples
 anymore and we then UPDATE using that new version I would rather not
 retoast all the unchanged columns.

 I can e.g. very well imagine that we decide to resolve toasted Datums to
 indirect Datums during an UPDATE if there are multiple BEFORE UPDATE
 triggers to avoid detoasting in each and every one of them. Such a tuple
 will then passed to heap_update...

I must be missing something.  At that point, yes, you'd like to avoid
re-toasting unnecessarily, but ISTM you've already bought the farm.
Unless I'm misunderstanding the code as written, you'd just end up
writing the indirect pointer back out to disk in that scenario.
That's not gonna work...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-27 Thread Robert Haas
On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 There will be a newer version of the patch coming today or tomorrow, so
 there's probably no point in looking at the one linked above before
 that...

This patch is marked as Ready for Committer in the CommitFest app.
But it is not clear to me where the patch is that is being proposed
for commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-27 Thread Hitoshi Harada
On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  There will be a newer version of the patch coming today or tomorrow, so
  there's probably no point in looking at the one linked above before
  that...

 This patch is marked as Ready for Committer in the CommitFest app.
 But it is not clear to me where the patch is that is being proposed
 for commit.



 No, this conversation is about patch #1153, pluggable toast compression,
which is in Needs Review, and you may be confused with #1127, extensible
external toast tuple support.


-- 
Hitoshi Harada


Re: [HACKERS] extensible external toast tuple support

2013-06-27 Thread Andres Freund
Hi,

Please find attached the next version of the extensible toast
support. There basically are two changes:

* handle indirect toast tuples properly in heap_tuple_fetch_attr
  and related places
* minor comment adjustments

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 5c7079067f870a65d65ee09cccbf743c88b64c82 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
 src/backend/access/heap/tuptoaster.c | 142 +++
 src/include/c.h  |   2 +
 src/include/postgres.h   |  84 +++--
 3 files changed, 191 insertions(+), 37 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..5ee5e33 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -87,11 +87,11 @@ static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  * heap_tuple_fetch_attr -
  *
  *	Public entry point to get back a toasted value from
- *	external storage (possibly still in compressed format).
+ *	external source (possibly still in compressed format).
  *
  * This will return a datum that contains all the data internally, ie, not
- * relying on external storage, but it can still be compressed or have a short
- * header.
+ * relying on external storage or memory, but it can still be compressed or
+ * have a short header.
  --
  */
 struct varlena *
@@ -99,13 +99,35 @@ heap_tuple_fetch_attr(struct varlena * attr)
 {
 	struct varlena *result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an external stored plain value
 		 */
 		result = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * copy into the caller's memory context. That's not required in all
+		 * cases but sufficient for now since this is mainly used when we need
+		 * to persist a Datum for unusually long time, like in a HOLD cursor.
+		 */
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* doesn't make much sense, but better handle it */
+		if (VARATT_IS_EXTERNAL_ONDISK(attr))
+			return heap_tuple_fetch_attr(attr);
+
+		/* copy datum verbatim */
+		result = (struct varlena *) palloc(VARSIZE_ANY(attr));
+		memcpy(result, attr, VARSIZE_ANY(attr));
+	}
 	else
 	{
 		/*
@@ -128,7 +150,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
 struct varlena *
 heap_tuple_untoast_attr(struct varlena * attr)
 {
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -145,6 +167,17 @@ heap_tuple_untoast_attr(struct varlena * attr)
 			pfree(tmp);
 		}
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		attr = heap_tuple_untoast_attr(attr);
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/*
@@ -191,7 +224,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 	char	   *attrdata;
 	int32		attrsize;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
 
@@ -204,6 +237,17 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 		/* fetch it back (compressed marker will get set automatically) */
 		preslice = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(redirect.pointer));
+
+		return heap_tuple_untoast_attr_slice(redirect.pointer,
+			 sliceoffset, slicelength);
+	}
 	else
 		preslice = attr;
 
@@ -267,7 +311,7 @@ toast_raw_datum_size(Datum value)
 	struct 

Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-27 Thread Andres Freund
On 2013-06-27 09:17:10 -0700, Hitoshi Harada wrote:
 On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas robertmh...@gmail.com wrote:
 
  On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund and...@2ndquadrant.com
  wrote:
   There will be a newer version of the patch coming today or tomorrow, so
   there's probably no point in looking at the one linked above before
   that...
 
  This patch is marked as Ready for Committer in the CommitFest app.
  But it is not clear to me where the patch is that is being proposed
  for commit.
 
 
 
  No, this conversation is about patch #1153, pluggable toast compression,
 which is in Needs Review, and you may be confused with #1127, extensible
 external toast tuple support.

Well, actually this is the correct thread, and pluggable toast
compression developed out of it. You had marked #1127 as ready for
committer although you had noticed an omission in
heap_tuple_fetch_attr...
Answered with the new patch version toplevel in the thread, to make this
hopefully less confusing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-19 Thread Hitoshi Harada
On Wed, Jun 5, 2013 at 8:01 AM, Andres Freund and...@2ndquadrant.comwrote:

 Two patches attached:
 1) add snappy to src/common. The integration needs some more work.
 2) Combined patch that adds indirect tuple and snappy compression. Those
 coul be separated, but this is an experiment so far...



I took a look at them a little.  This proposal is a super set of patch
#1127.
https://commitfest.postgresql.org/action/patch_view?id=1127

- endian.h is not found in my mac.  Commented it out, it builds clean.
- I don't see what the added is_inline flag means in
toast_compress_datum().  Obviously not used, but I wonder what was the
intention.
- By this,
 * compression method. We could just use the two bytes to store 3 other
 * compression methods but maybe we better don't paint ourselves in a
 * corner again.
you mean two bits, not two bytes?

And patch adds snappy-c in src/common.  I definitely like the idea to have
pluggability for different compression algorithm for datum, but I am not
sure if this location is a good place to add it.  Maybe we want a modern
algorithm other than pglz for different components across the system in
core, and it's better to let users choose which to add more.  The mapping
between the index number and compression algorithm should be consistent for
the entire life of database, so it should be defined at initdb time.  From
core maintainability perspective and binary size of postgres, I don't think
we want to put dozenes of different algorithms into core in the future.
And maybe someone will want to try BSD-incompatible algorithm privately.

I guess it's ok to use one more byte to indicate which compression is used
for the value.  It is a compressed datum and we don't expect something
short anyway.  I don't see big problems in this patch other than how to
manage the pluggability, but it is a WIP patch anyway, so I'm going to mark
it as Returned with Feedback.

Thanks,

-- 
Hitoshi Harada


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-19 Thread Andres Freund
On 2013-06-19 00:15:56 -0700, Hitoshi Harada wrote:
 On Wed, Jun 5, 2013 at 8:01 AM, Andres Freund and...@2ndquadrant.comwrote:
 
  Two patches attached:
  1) add snappy to src/common. The integration needs some more work.
  2) Combined patch that adds indirect tuple and snappy compression. Those
  coul be separated, but this is an experiment so far...
 
 
 
 I took a look at them a little.  This proposal is a super set of patch
 #1127.
 https://commitfest.postgresql.org/action/patch_view?id=1127
 
 - endian.h is not found in my mac.  Commented it out, it builds clean.
 - I don't see what the added is_inline flag means in
 toast_compress_datum().  Obviously not used, but I wonder what was the
 intention.

Hm. I don't think you've looked at the latest version of the patch,
check
http://archives.postgresql.org/message-id/20130614230625.gd19...@awork2.anarazel.de
- that should be linked from the commitfest. The is_inline part should
be gone there.

 - By this,
  * compression method. We could just use the two bytes to store 3 other
  * compression methods but maybe we better don't paint ourselves in a
  * corner again.
 you mean two bits, not two bytes?

Yes, typo... The plan is to use those two bits in the following way
- 00 pglz
- 01 snappy/lz4/whatever
- 10 another
- 11 one extra byte

 And patch adds snappy-c in src/common.  I definitely like the idea to have
 pluggability for different compression algorithm for datum, but I am not
 sure if this location is a good place to add it.  Maybe we want a modern
 algorithm other than pglz for different components across the system in
 core, and it's better to let users choose which to add more.  The mapping
 between the index number and compression algorithm should be consistent for
 the entire life of database, so it should be defined at initdb time.  From
 core maintainability perspective and binary size of postgres, I don't think
 we want to put dozenes of different algorithms into core in the future.
 And maybe someone will want to try BSD-incompatible algorithm privately.

We've argued about this in the linked thread and I still think we should
add one algorithm now, and when that one is outdated - which probably
will be some time - replace it. Building enough infrastructure to make
this really pluggable is not likely enough to be beneficial to many.

There will be a newer version of the patch coming today or tomorrow, so
there's probably no point in looking at the one linked above before
that...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-18 Thread Hitoshi Harada
On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund and...@2ndquadrant.comwrote:


 Here's the updated version. It shouldn't contain any obvious WIP pieces
 anymore, although I think it needs some more documentation. I am just
 not sure where to add it yet, postgres.h seems like a bad place :/


I have a few comments and questions after reviewing this patch.

- heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK macro?
- I'm not sure if plural for datum is good to use.  Datum values?
- -1 from me to use enum for tag types, as I don't think it needs to be.
This looks more like other kind macros such as relkind, but I know
there's pros/cons
- don't we need cast for tag value comparison in VARTAG_SIZE macro, since
tag is unit8 and enum is signed int?
- Is there better way to represent ONDISK size, instead of magic number
18?  I'd suggest constructing it with sizeof(varatt_external).

Other than that, the patch applies fine and make check runs, though I don't
think the new code path is exercised well, as no one is creating INDIRECT
datum yet.

Also, I wonder how we could add more compression in datum, as well as how
we are going to add more compression options in database.  I'd love to see
pluggability here, as surely the core cannot support dozens of different
compression algorithms, but because the data on disk is critical and we
cannot do anything like user defined functions.  The algorithms should be
optional builtin so that once the system is initialized the the plugin
should not go away.  Anyway pluggable compression is off-topic here.

Thanks,
-- 
Hitoshi Harada


Re: [HACKERS] extensible external toast tuple support

2013-06-18 Thread Andres Freund
On 2013-06-18 00:56:17 -0700, Hitoshi Harada wrote:
 On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund and...@2ndquadrant.comwrote:
 
 
  Here's the updated version. It shouldn't contain any obvious WIP pieces
  anymore, although I think it needs some more documentation. I am just
  not sure where to add it yet, postgres.h seems like a bad place :/
 
 
 I have a few comments and questions after reviewing this patch.

Cool!

 - heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK macro?

It calls toast_fetch_datum() for any kind of external datum, so it
should be fine since ONDISK is handled in there.

 - I'm not sure if plural for datum is good to use.  Datum values?

Not sure at all either. Supposedly data is the plural for datum, but for
the capitalized version Datums is used in other places, so I think it
might be fine.

 - -1 from me to use enum for tag types, as I don't think it needs to be.
 This looks more like other kind macros such as relkind, but I know
 there's pros/cons

Well, relkind cannot easily be a enum because it needs to be stored in a
char field. I like using enums because it gives you the chance of using
switch()es at some point and getting warned about missed cases.

Why do you dislike it?

 - don't we need cast for tag value comparison in VARTAG_SIZE macro, since
 tag is unit8 and enum is signed int?

I think it should be fine (and the compiler doesn't warn) due to the
integer promotion rules.

 - Is there better way to represent ONDISK size, instead of magic number
 18?  I'd suggest constructing it with sizeof(varatt_external).

I explicitly did not want to do that, since the numbers really don't
have anything to do with the struct size anymore. Otherwise the next
person reading that part will be confused because there's a 8byte struct
with the enum value 1. Or somebody will try adding another type of
external tuple that also needs 18 bytes by copying the sizeof(). Which
will fail in ugly, non-obvious ways.

 Other than that, the patch applies fine and make check runs, though I don't
 think the new code path is exercised well, as no one is creating INDIRECT
 datum yet.

Yea, I only use the API in the changeset extraction patch. That actually
worries me to. But I am not sure where to introduce usage of it in core
without making the patchset significantly larger. I was thinking of
adding an option to heap_form_tuple/heap_fill_tuple to allow it to
reference _4B Datums instead of copying them, but that would require
quite some adjustments on the callsites.

 Also, I wonder how we could add more compression in datum, as well as how
 we are going to add more compression options in database.  I'd love to see
 pluggability here, as surely the core cannot support dozens of different
 compression algorithms, but because the data on disk is critical and we
 cannot do anything like user defined functions.  The algorithms should be
 optional builtin so that once the system is initialized the the plugin
 should not go away.  Anyway pluggable compression is off-topic here.

Separate patchset by now, yep ;).

Thanks!

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-18 Thread Hitoshi Harada
On Tue, Jun 18, 2013 at 1:58 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-06-18 00:56:17 -0700, Hitoshi Harada wrote:
  On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  
   Here's the updated version. It shouldn't contain any obvious WIP pieces
   anymore, although I think it needs some more documentation. I am just
   not sure where to add it yet, postgres.h seems like a bad place :/
  
  
  I have a few comments and questions after reviewing this patch.

 Cool!

  - heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK
 macro?

 It calls toast_fetch_datum() for any kind of external datum, so it
 should be fine since ONDISK is handled in there.


toast_fetch_datum doesn't expect the input is INDIRECT.  At least I see the
code path in the same file around toast_insert_or_update() where we have a
chance to (possibly accidentally) try to fetch ONDISK toasted value from
non-ONDISK datum.


  - -1 from me to use enum for tag types, as I don't think it needs to be.
  This looks more like other kind macros such as relkind, but I know
  there's pros/cons

 Well, relkind cannot easily be a enum because it needs to be stored in a
 char field. I like using enums because it gives you the chance of using
 switch()es at some point and getting warned about missed cases.

 Why do you dislike it?


 I put -1 just because it doesn't have to be now.  If you argue relkind is
char field, tag is also uint8.


  - don't we need cast for tag value comparison in VARTAG_SIZE macro, since
  tag is unit8 and enum is signed int?

 I think it should be fine (and the compiler doesn't warn) due to the
 integer promotion rules.



 - Is there better way to represent ONDISK size, instead of magic number
  18?  I'd suggest constructing it with sizeof(varatt_external).

 I explicitly did not want to do that, since the numbers really don't
 have anything to do with the struct size anymore. Otherwise the next
 person reading that part will be confused because there's a 8byte struct
 with the enum value 1. Or somebody will try adding another type of
 external tuple that also needs 18 bytes by copying the sizeof(). Which
 will fail in ugly, non-obvious ways.


Sounds reasonable.  I was just confused when I looked at it first.


  Other than that, the patch applies fine and make check runs, though I
 don't
  think the new code path is exercised well, as no one is creating INDIRECT
  datum yet.

 Yea, I only use the API in the changeset extraction patch. That actually
 worries me to. But I am not sure where to introduce usage of it in core
 without making the patchset significantly larger. I was thinking of
 adding an option to heap_form_tuple/heap_fill_tuple to allow it to
 reference _4B Datums instead of copying them, but that would require
 quite some adjustments on the callsites.


Maybe you can create a user-defined function that creates such datum for
testing, just in order to demonstrate it works fine.


  Also, I wonder how we could add more compression in datum, as well as how
  we are going to add more compression options in database.  I'd love to
 see
  pluggability here, as surely the core cannot support dozens of different
  compression algorithms, but because the data on disk is critical and we
  cannot do anything like user defined functions.  The algorithms should be
  optional builtin so that once the system is initialized the the plugin
  should not go away.  Anyway pluggable compression is off-topic here.

 Separate patchset by now, yep ;).


I just found.  Will look into it.

Thanks,


Re: [HACKERS] extensible external toast tuple support

2013-06-18 Thread Andres Freund
On 2013-06-18 10:13:39 -0700, Hitoshi Harada wrote:
 On Tue, Jun 18, 2013 at 1:58 AM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2013-06-18 00:56:17 -0700, Hitoshi Harada wrote:
   On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund and...@2ndquadrant.com
  wrote:
  
   
Here's the updated version. It shouldn't contain any obvious WIP pieces
anymore, although I think it needs some more documentation. I am just
not sure where to add it yet, postgres.h seems like a bad place :/
   
   
   I have a few comments and questions after reviewing this patch.
 
  Cool!
 
   - heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK
  macro?
 
  It calls toast_fetch_datum() for any kind of external datum, so it
  should be fine since ONDISK is handled in there.
 
 
 toast_fetch_datum doesn't expect the input is INDIRECT.  At least I see the
 code path in the same file around toast_insert_or_update() where we have a
 chance to (possibly accidentally) try to fetch ONDISK toasted value from
 non-ONDISK datum.

Hm. Yes. I don't think that's really possible in any codepath I have
thought of - or tested - but that's not a good reason not to make it robust.

   - -1 from me to use enum for tag types, as I don't think it needs to be.
   This looks more like other kind macros such as relkind, but I know
   there's pros/cons
 
  Well, relkind cannot easily be a enum because it needs to be stored in a
  char field. I like using enums because it gives you the chance of using
  switch()es at some point and getting warned about missed cases.
 
  Why do you dislike it?
 
 
  I put -1 just because it doesn't have to be now.  If you argue relkind is
 char field, tag is also uint8.

Well, but to expose it to sql you need a numeric value that actually
translates to a visible ascii character.


 Maybe you can create a user-defined function that creates such datum for
 testing, just in order to demonstrate it works fine.

Hm. Will try whether I can think of something not completely pointless
;)


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-14 Thread Andres Freund
On 2013-05-31 23:42:51 -0400, Robert Haas wrote:
 On Thu, May 30, 2013 at 7:42 AM, Andres Freund and...@2ndquadrant.com wrote:
  In
  http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de
  I presented the need for 'indirect' toast tuples which point into memory
  instead of a toast table. In the comments to that proposal, off-list and
  in-person talks the wish to make that a more general concept has
  been voiced.
 
  The previous patch used varattrib_1b_e.va_len_1be to discern between
  different types of external tuples. That obviously only works if the
  data sizes of all possibly stored datum types are distinct which isn't
  nice. So what the newer patch now does is to rename that field into
  'va_tag' and decide based on that what kind of Datum we have. To get the
  actual length of that datum there now is a VARTAG_SIZE() macro which
  maps the tags back to size.
  To keep on-disk compatibility the size of an external toast tuple
  containing a varatt_external is used as its tag value.
 
  This should allow for fairly easy development of a new compression
  scheme for out-of-line toast tuples. It will *not* work for compressed
  inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a
  problem or that if it is, that it cannot be solved separately.
 
  FWIW, in some quick microbenchmarks I couldn't find any performance
  difference due to the slightly more complex size computation which I do
  *not* find surprising.
 
  Opinions?
 
 Seems pretty sensible to me.  The patch is obviously WIP but the
 direction seems fine to me.

Here's the updated version. It shouldn't contain any obvious WIP pieces
anymore, although I think it needs some more documentation. I am just
not sure where to add it yet, postgres.h seems like a bad place :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 654e24e9a615dcacea4d9714cf8cdbf6953983d5 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
 src/backend/access/heap/tuptoaster.c | 100 +++
 src/include/c.h  |   2 +
 src/include/postgres.h   |  83 +
 3 files changed, 153 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..99044d0 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -128,7 +128,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
 struct varlena *
 heap_tuple_untoast_attr(struct varlena * attr)
 {
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -145,6 +145,15 @@ heap_tuple_untoast_attr(struct varlena * attr)
 			pfree(tmp);
 		}
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		attr = heap_tuple_untoast_attr(attr);
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/*
@@ -191,7 +200,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 	char	   *attrdata;
 	int32		attrsize;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
 
@@ -204,6 +213,13 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 		/* fetch it back (compressed marker will get set automatically) */
 		preslice = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		return heap_tuple_untoast_attr_slice(redirect.pointer,
+			 sliceoffset, slicelength);
+	}
 	else
 		preslice = attr;
 
@@ -267,7 +283,7 @@ toast_raw_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/* va_rawsize is the size of the original datum -- including header */
 		struct 

Re: [HACKERS] extensible external toast tuple support

2013-06-14 Thread Alvaro Herrera
Andres Freund escribió:

 Here's the updated version. It shouldn't contain any obvious WIP pieces
 anymore, although I think it needs some more documentation. I am just
 not sure where to add it yet, postgres.h seems like a bad place :/

How about a new file, say src/include/access/toast.h?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-14 Thread Andres Freund
On 2013-06-14 19:14:15 -0400, Alvaro Herrera wrote:
 Andres Freund escribió:
 
  Here's the updated version. It shouldn't contain any obvious WIP pieces
  anymore, although I think it needs some more documentation. I am just
  not sure where to add it yet, postgres.h seems like a bad place :/
 
 How about a new file, say src/include/access/toast.h?

Well, the question is if that buys us all that much, we need the varlena
definitions to be available pretty much everywhere. Except of section 3
- which we reduced to be pretty darn small these days - of postgres.h
pretty much all of it is concerned with Datums, a good of them being
varlenas.
We could move section 1) into its own file and unconditionally include
it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Robert Haas
On Wed, Jun 5, 2013 at 11:01 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-05-31 23:42:51 -0400, Robert Haas wrote:
  This should allow for fairly easy development of a new compression
  scheme for out-of-line toast tuples. It will *not* work for compressed
  inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a
  problem or that if it is, that it cannot be solved separately.

 Seems pretty sensible to me.  The patch is obviously WIP but the
 direction seems fine to me.

 So, I played a bit more with this, with an eye towards getting this into
 a non WIP state, but: While I still think the method for providing
 indirect external Datum support is fine, I don't think my sketch for
 providing extensible compression is.

I didn't really care about doing (and don't really want to do) both
things in the same patch.  I just didn't want the patch to shut the
door to extensible compression in the future.

 Important questions are:
 1) Which algorithms do we want? I think snappy is a good candidate but I
 mostly chose it because it's BSD licenced, widely used, and has a C
 implementation with a useable API. LZ4 might be another interesting
 choice. Another slower algorithm with higher compression ratio
 would also be a good idea for many applications.

I have no opinion on this.

 2) Do we want to build infrastructure for more than 3 compression
 algorithms? We could delay that decision till we add the 3rd.

I think we should leave the door open, but I don't have a compelling
desire to get too baroque for v1.  Still, maybe if the first byte has
a 1 in the high-bit, the next 7 bits should be defined as specifying a
compression algorithm.  3 compression algorithms would probably last
us a while; but 127 should last us, in effect, forever.

 3) Surely choosing the compression algorithm via GUC ala SET
 toast_compression_algo = ... isn't the way to go. I'd say a storage
 attribute is more appropriate?

The way we do caching right now supposes that attoptions will be
needed only occasionally.  It might need to be revised if we're going
to need it all the time.  Or else we might need to use a dedicated
pg_class column.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Andres Freund
On 2013-06-07 10:04:15 -0400, Robert Haas wrote:
 On Wed, Jun 5, 2013 at 11:01 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-05-31 23:42:51 -0400, Robert Haas wrote:
   This should allow for fairly easy development of a new compression
   scheme for out-of-line toast tuples. It will *not* work for compressed
   inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a
   problem or that if it is, that it cannot be solved separately.
 
  Seems pretty sensible to me.  The patch is obviously WIP but the
  direction seems fine to me.
 
  So, I played a bit more with this, with an eye towards getting this into
  a non WIP state, but: While I still think the method for providing
  indirect external Datum support is fine, I don't think my sketch for
  providing extensible compression is.
 
 I didn't really care about doing (and don't really want to do) both
 things in the same patch.  I just didn't want the patch to shut the
 door to extensible compression in the future.

Oh. I don't want to actually commit it in the same patch either. But to
keep the road for extensible compression open we kinda need to know what
the way to do that is. Turns out it's an independent thing that doesn't
reuse any of the respective infrastructures.

I only went so far to actually implement the compression because a) my
previous thoughts about how it could work were bogus b) it was fun.

Turns out the benefits are imo big enough to make it worth pursuing
further.

  2) Do we want to build infrastructure for more than 3 compression
  algorithms? We could delay that decision till we add the 3rd.
 
 I think we should leave the door open, but I don't have a compelling
 desire to get too baroque for v1.  Still, maybe if the first byte has
 a 1 in the high-bit, the next 7 bits should be defined as specifying a
 compression algorithm.  3 compression algorithms would probably last
 us a while; but 127 should last us, in effect, forever.

The problem is that to discern from pglz on little endian the byte with
the two high bits unset is actually the fourth byte in a toast datum. So
we would need to store it in the 5th byte or invent some more
complicated encoding scheme.

So I think we should just define '00' as pglz, '01' as xxx, '10' as yyy
and '11' as storing the schema in the next byte.

  3) Surely choosing the compression algorithm via GUC ala SET
  toast_compression_algo = ... isn't the way to go. I'd say a storage
  attribute is more appropriate?
 
 The way we do caching right now supposes that attoptions will be
 needed only occasionally.  It might need to be revised if we're going
 to need it all the time.  Or else we might need to use a dedicated
 pg_class column.

Good point. It probably belongs right besides attstorage, seems to be
the most consistent choice anyway.

Alternatively, if we only add one form of compression, we can just
always store in snappy/lz4/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Robert Haas
On Fri, Jun 7, 2013 at 10:30 AM, Andres Freund and...@2ndquadrant.com wrote:
 Turns out the benefits are imo big enough to make it worth pursuing
 further.

Yeah, those were nifty numbers.

 The problem is that to discern from pglz on little endian the byte with
 the two high bits unset is actually the fourth byte in a toast datum. So
 we would need to store it in the 5th byte or invent some more
 complicated encoding scheme.

 So I think we should just define '00' as pglz, '01' as xxx, '10' as yyy
 and '11' as storing the schema in the next byte.

Not totally following, but I'm fine with that.

  3) Surely choosing the compression algorithm via GUC ala SET
  toast_compression_algo = ... isn't the way to go. I'd say a storage
  attribute is more appropriate?

 The way we do caching right now supposes that attoptions will be
 needed only occasionally.  It might need to be revised if we're going
 to need it all the time.  Or else we might need to use a dedicated
 pg_class column.

 Good point. It probably belongs right besides attstorage, seems to be
 the most consistent choice anyway.

Possibly, we could even store it in attstorage.  We're really only
using two bits of that byte right now, so just invent some more
letters.

 Alternatively, if we only add one form of compression, we can just
 always store in snappy/lz4/

Not following.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Andres Freund
On 2013-06-07 10:44:24 -0400, Robert Haas wrote:
 On Fri, Jun 7, 2013 at 10:30 AM, Andres Freund and...@2ndquadrant.com wrote:
  Turns out the benefits are imo big enough to make it worth pursuing
  further.
 
 Yeah, those were nifty numbers.
 
  The problem is that to discern from pglz on little endian the byte with
  the two high bits unset is actually the fourth byte in a toast datum. So
  we would need to store it in the 5th byte or invent some more
  complicated encoding scheme.
 
  So I think we should just define '00' as pglz, '01' as xxx, '10' as yyy
  and '11' as storing the schema in the next byte.
 
 Not totally following, but I'm fine with that.

Currently on a little endian system the pglz header contains the length
in the first four bytes as:
[][][][xxdd]
Where dd are valid length bits for pglz and xx are the two bits which
are always zero since we only ever store up to 1GB. We can redefine 'xx'
to mean whatever we want but we cannot change it's place.

   3) Surely choosing the compression algorithm via GUC ala SET
   toast_compression_algo = ... isn't the way to go. I'd say a storage
   attribute is more appropriate?
 
  The way we do caching right now supposes that attoptions will be
  needed only occasionally.  It might need to be revised if we're going
  to need it all the time.  Or else we might need to use a dedicated
  pg_class column.
 
  Good point. It probably belongs right besides attstorage, seems to be
  the most consistent choice anyway.
 
 Possibly, we could even store it in attstorage.  We're really only
 using two bits of that byte right now, so just invent some more
 letters.

Hm. Possible, but I don't think that's worth it. There's a padding byte
before attinhcount anyway.
Storing the preferred location in attstorage (plain, preferred-internal,
external, preferred-external) separately from the compression seems to
make sense to me.

  Alternatively, if we only add one form of compression, we can just
  always store in snappy/lz4/
 
 Not following.

I mean, we don't necessarily need to make it configurable if we just add
one canonical new better compression format. I am not sure that's
sufficient since I can see usecases for 'very fast but not too well
compressed' and 'very well compressed but slow', but I am personally not
really interested in the second case, so ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Hannu Krosing
On 06/07/2013 04:54 PM, Andres Freund wrote:

 I mean, we don't necessarily need to make it configurable if we just add
 one canonical new better compression format. I am not sure that's
 sufficient since I can see usecases for 'very fast but not too well
 compressed' and 'very well compressed but slow', but I am personally not
 really interested in the second case, so ...
As DE-comression is often still fast for slow-but-good compression,
the obvious use case for 2nd is read-mostly data

 Greetings,

 Andres Freund



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Andres Freund
On 2013-06-07 17:27:28 +0200, Hannu Krosing wrote:
 On 06/07/2013 04:54 PM, Andres Freund wrote:
 
  I mean, we don't necessarily need to make it configurable if we just add
  one canonical new better compression format. I am not sure that's
  sufficient since I can see usecases for 'very fast but not too well
  compressed' and 'very well compressed but slow', but I am personally not
  really interested in the second case, so ...

 As DE-comression is often still fast for slow-but-good compression,
 the obvious use case for 2nd is read-mostly data

Well. Those algorithms still are up to magnitude or so slower
decompressing than something like snappy, lz4 or even pglz while the
compression ratio usually is only like 50-80% improved... So you really
need a good bit of compressible data (so the amount of storage actually
hurts) that you don't read all that often (since you then would
bottleneck on compression too often).
That's just not something I run across to regularly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I mean, we don't necessarily need to make it configurable if we just add
 one canonical new better compression format. I am not sure that's
 sufficient since I can see usecases for 'very fast but not too well
 compressed' and 'very well compressed but slow', but I am personally not
 really interested in the second case, so ...

IME, once we've changed it once, the odds that we'll want to change it
again go up drastically.  I think if we're going to make a change here
we should leave room for future revisions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Andres Freund
On 2013-06-07 11:46:45 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I mean, we don't necessarily need to make it configurable if we just add
  one canonical new better compression format. I am not sure that's
  sufficient since I can see usecases for 'very fast but not too well
  compressed' and 'very well compressed but slow', but I am personally not
  really interested in the second case, so ...
 
 IME, once we've changed it once, the odds that we'll want to change it
 again go up drastically.  I think if we're going to make a change here
 we should leave room for future revisions.

The above bit was just about how much control we give the user over the
compression algorithm used for compressing new data. If we just add one
method atm which we think is just about always better than pglz there's
not much need to provide the tunables already.

I don't think there's any question over the fact that we should leave
room on the storage level to reasonably easy add new compression
algorithms without requiring on-disk changes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Alvaro Herrera
Andres Freund escribió:

 2) Combined patch that adds indirect tuple and snappy compression. Those
 coul be separated, but this is an experiment so far...

Can we have a separate header for toast definitions? (i.e. split them
out of postgres.h)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-06-07 11:46:45 -0400, Tom Lane wrote:
 IME, once we've changed it once, the odds that we'll want to change it
 again go up drastically.  I think if we're going to make a change here
 we should leave room for future revisions.

 The above bit was just about how much control we give the user over the
 compression algorithm used for compressing new data. If we just add one
 method atm which we think is just about always better than pglz there's
 not much need to provide the tunables already.

Ah, ok, I thought you were talking about storage-format decisions not
about whether to expose a tunable setting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Hannu Krosing
On 06/07/2013 05:38 PM, Andres Freund wrote:
 On 2013-06-07 17:27:28 +0200, Hannu Krosing wrote:
 On 06/07/2013 04:54 PM, Andres Freund wrote:
 I mean, we don't necessarily need to make it configurable if we just add
 one canonical new better compression format. I am not sure that's
 sufficient since I can see usecases for 'very fast but not too well
 compressed' and 'very well compressed but slow', but I am personally not
 really interested in the second case, so ...
 As DE-comression is often still fast for slow-but-good compression,
 the obvious use case for 2nd is read-mostly data
 Well. Those algorithms still are up to magnitude or so slower
 decompressing than something like snappy, lz4 or even pglz while the
 compression ratio usually is only like 50-80% improved... So you really
 need a good bit of compressible data (so the amount of storage actually
 hurts) that you don't read all that often (since you then would
 bottleneck on compression too often).
 That's just not something I run across to regularly.
While the difference in compression speeds between algorithms is
different, it may be more then offset in favour of better compression
if there is real I/O involved as exemplified here:
http://www.citusdata.com/blog/64-zfs-compression

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 Currently on a little endian system the pglz header contains the length
 in the first four bytes as:
 [][][][xxdd]
 Where dd are valid length bits for pglz and xx are the two bits which
 are always zero since we only ever store up to 1GB. We can redefine 'xx'
 to mean whatever we want but we cannot change it's place.

I'm not thrilled with the idea of using those 2 bits from the length
integer.  I understand the point of it and that we'd be able to have
binary compatibility from it but is it necessary to track at the
per-tuple level..?  What about possibly supporting 1GB objects at some
point (yes, I know there's a lot of other issues there, but still).
We've also got complexity around the size of the length integer already.

Anyway, just not 100% sure that we really want to use these bits for
this.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extensible external toast tuple support snappy prototype

2013-06-07 Thread Andres Freund
On 2013-06-07 12:16:48 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  Currently on a little endian system the pglz header contains the length
  in the first four bytes as:
  [][][][xxdd]
  Where dd are valid length bits for pglz and xx are the two bits which
  are always zero since we only ever store up to 1GB. We can redefine 'xx'
  to mean whatever we want but we cannot change it's place.
 
 I'm not thrilled with the idea of using those 2 bits from the length
 integer.  I understand the point of it and that we'd be able to have
 binary compatibility from it but is it necessary to track at the
 per-tuple level..?  What about possibly supporting 1GB objects at some
 point (yes, I know there's a lot of other issues there, but still).
 We've also got complexity around the size of the length integer already.

I am open for different suggestions, but I don't know of any realistic
ones.
Note that the 1GB limitation is already pretty heavily baked in into
varlenas itself (which is not the length we are talking about here!)
since we use the two remaining bits discern between the 4 types of
varlenas we have.

* short (1b)
* short, pointing to a ondisk tuple (1b_e)
* long (4b)
* long compressed (4b_c)

Since long compressed ones always need to be convertible to long ones we
can't ever have a 'rawsize' (which is what's proposed to be used for
this) that's larger than 1GB.

So, breaking the 1GB limit will not be stopped by this, but much much
earlier. And it will require a break in on-disk compatibility.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-05-31 Thread Robert Haas
On Thu, May 30, 2013 at 7:42 AM, Andres Freund and...@2ndquadrant.com wrote:
 In
 http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de
 I presented the need for 'indirect' toast tuples which point into memory
 instead of a toast table. In the comments to that proposal, off-list and
 in-person talks the wish to make that a more general concept has
 been voiced.

 The previous patch used varattrib_1b_e.va_len_1be to discern between
 different types of external tuples. That obviously only works if the
 data sizes of all possibly stored datum types are distinct which isn't
 nice. So what the newer patch now does is to rename that field into
 'va_tag' and decide based on that what kind of Datum we have. To get the
 actual length of that datum there now is a VARTAG_SIZE() macro which
 maps the tags back to size.
 To keep on-disk compatibility the size of an external toast tuple
 containing a varatt_external is used as its tag value.

 This should allow for fairly easy development of a new compression
 scheme for out-of-line toast tuples. It will *not* work for compressed
 inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a
 problem or that if it is, that it cannot be solved separately.

 FWIW, in some quick microbenchmarks I couldn't find any performance
 difference due to the slightly more complex size computation which I do
 *not* find surprising.

 Opinions?

Seems pretty sensible to me.  The patch is obviously WIP but the
direction seems fine to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] extensible external toast tuple support

2013-05-30 Thread Andres Freund
Hi,

In
http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de
I presented the need for 'indirect' toast tuples which point into memory
instead of a toast table. In the comments to that proposal, off-list and
in-person talks the wish to make that a more general concept has
been voiced.

The previous patch used varattrib_1b_e.va_len_1be to discern between
different types of external tuples. That obviously only works if the
data sizes of all possibly stored datum types are distinct which isn't
nice. So what the newer patch now does is to rename that field into
'va_tag' and decide based on that what kind of Datum we have. To get the
actual length of that datum there now is a VARTAG_SIZE() macro which
maps the tags back to size.
To keep on-disk compatibility the size of an external toast tuple
containing a varatt_external is used as its tag value.

This should allow for fairly easy development of a new compression
scheme for out-of-line toast tuples. It will *not* work for compressed
inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a
problem or that if it is, that it cannot be solved separately.

FWIW, in some quick microbenchmarks I couldn't find any performance
difference due to the slightly more complex size computation which I do
*not* find surprising.

Opinions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 43416ee71033a1bd12bec5e651ff45ea9eeafd56 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 17 Feb 2013 01:38:17 +0100
Subject: [PATCH] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
 src/backend/access/heap/tuptoaster.c | 115 ---
 src/include/postgres.h   | 100 +++---
 2 files changed, 183 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..46c7cf4 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -128,7 +128,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
 struct varlena *
 heap_tuple_untoast_attr(struct varlena * attr)
 {
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_OLDSTYLE(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -145,6 +145,19 @@ heap_tuple_untoast_attr(struct varlena * attr)
 			pfree(tmp);
 		}
 	}
+	else if (VARATT_IS_EXTERNAL_COMPRESSED(attr))
+	{
+		elog(ERROR, not yet);
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		attr = heap_tuple_untoast_attr(attr);
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/*
@@ -191,7 +204,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 	char	   *attrdata;
 	int32		attrsize;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_OLDSTYLE(attr))
 	{
 		struct varatt_external toast_pointer;
 
@@ -204,6 +217,13 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 		/* fetch it back (compressed marker will get set automatically) */
 		preslice = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		return heap_tuple_untoast_attr_slice(redirect.pointer,
+			 sliceoffset, slicelength);
+	}
 	else
 		preslice = attr;
 
@@ -267,7 +287,7 @@ toast_raw_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_OLDSTYLE(attr))
 	{
 		/* va_rawsize is the size of the original datum -- including header */
 		struct varatt_external toast_pointer;
@@ -275,6 +295,17 @@ toast_raw_datum_size(Datum value)
 		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
 		result = toast_pointer.va_rawsize;
 	}
+	else if (VARATT_IS_EXTERNAL_COMPRESSED(attr))
+	{
+		elog(ERROR, not yet);
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect toast_pointer;
+
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		return