Re: RFC: Pluggable TOAST

2023-11-15 Thread Matthias van de Meent
On Tue, 14 Nov 2023, 14:12 Nikita Malakhov,  wrote:
>
> Hi!
>
> Matthias, regarding your message above, I have a question to ask.
> On typed TOAST implementations - we thought that TOAST method used
> for storing data could depend not only on data type, but on the flow or 
> workload,
> like out bytea appendable toaster which is much (hundreds of times) faster on
> update compared to regular procedure. That was one of ideas behind the
> Pluggable TOAST - we can choose the most suitable TOAST implementation
> available.
>
> If we have a single TOAST entry point for data type - then we should have
> some means to control it or choose a TOAST method suitable to our needs.
> Or should not?

I'm not sure my interpretation of the question is correct, but I'll
assume it's "would you want something like STORAGE
[plain/external/...] for controlling type-specific toast operations?".

I don't see many reasons why we'd need a system to disable (some of)
those features, with the only one being "the workload is mostly
read-only of the full attributes, so any performance overhead of
type-aware detoasting is not worth the temporary space savings during
updates". So, while I do think there would be good reasons for typed
toasting to be disabled, I don't see a good reason for only specific
parts of type-specific toasting to be disabled (no reason for 'disable
the append optimization for bytea, but not the splice optimization').

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: RFC: Pluggable TOAST

2023-11-14 Thread Nikita Malakhov
Hi!

Matthias, regarding your message above, I have a question to ask.
On typed TOAST implementations - we thought that TOAST method used
for storing data could depend not only on data type, but on the flow or
workload,
like out bytea appendable toaster which is much (hundreds of times) faster
on
update compared to regular procedure. That was one of ideas behind the
Pluggable TOAST - we can choose the most suitable TOAST implementation
available.

If we have a single TOAST entry point for data type - then we should have
some means to control it or choose a TOAST method suitable to our needs.
Or should not?

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: RFC: Pluggable TOAST

2023-11-07 Thread Matthias van de Meent
On Tue, 7 Nov 2023 at 11:06, Nikita Malakhov  wrote:
>
> Hi,
>
> I've been thinking about Matthias' proposals for some time and have some
> questions:
>
> >So, in short, I don't think there is a need for a specific "Pluggable
> >toast API" like the one in the patchset at [0] that can be loaded
> >on-demand, but I think that updating our current TOAST system to a
> >system for which types can provide support functions would likely be
> >quite beneficial, for efficient extraction of data from composite
> >values.
>
> As I understand one of the reasons against Pluggable TOAST is that differences
> in plugged-in Toasters could result in incompatibility even in different 
> versions
> of the same DB.

That could be part of it, but it definitely wasn't my primary concern.
The primary concern remains that the pluggable toaster patch made the
jsonb type expose an API for a pluggable toaster that for all intents
and purposes only has one implementation due to its API being
specifically tailored for the jsonb internals use case, with similar
type-specific API bindings getting built for other types, each having
strict expectations about the details of the implementation. I agree
that it makes sense to specialize TOASTing for jsonb, but what I don't
understand about it is why that would need to be achieved outside the
core jsonb code.

I understand that the 'pluggable toaster' APIs originate from one of
PostgresPRO's forks of PostgreSQL, and I think it shows. That's not to
say it's bad, but it seems to be built on different expectations:
When maintaining a fork, you have different tradeoffs when compared to
maintaining the main product. A fork's changes need to be covered
across many versions with unknown changes, thus you would want the
smalles possible changes to enable the feature - pluggable toast makes
sense here, as the changes are limited to a few jsonb internals, but
most complex code is in an extension.
However, for core PostgreSQL, I think this separation makes very
little sense: the complexity of maintaining a toast api for each type
(when there can be expected to be only one implementation) is much
more work than just building a good set of helper functions that do
that same job. It allows for more flexibility, as there is no
noticable black box api implementation to keep track of.

> The importance of the correct TOAST update is out of question, feel like I 
> have
> to prepare a patch for it. There are some questions though, I'd address them
> later with a patch.
>
> >Example support functions:
>
> >/* TODO: bikeshedding on names, signatures, further support functions. */
> >Datum typsup_roastsliceofbread(Datum ptr, int sizetarget, char cmethod)
> >Datum typsup_unroastsliceofbread(Datum ptr)
> >void typsup_releaseroastedsliceofbread(Datump ptr) /* in case of
> >non-unitary in-memory datums */
>
> I correctly understand that you mean extending PG_TYPE and type cache,
> by adding a new function set for toasting/detoasting a value in addition to
> in/out, etc?

Yes.

> I see several issues here:
> 1) We could benefit from knowledge of internals of data being toasted (i.e.
> in case of JSON value with key-value structure) only when EXTERNAL
> storage mode is set, otherwise value will be compressed before toasted.
> So we have to keep both TOAST mechanics regarding the storage mode
> being used. It's the same issue as in Pluggable TOAST. Is it OK?

I think it is OK that the storage-related changes of this only start
once the toast mechanism is

> 2) TOAST pointer is very limited in means of data it keeps, we'd have to
> extend it anyway and keep both for backwards compatibility;

Yes. We already have to retain the current (de)toast infrastructure to
make sure current data files can still be read, given that we want to
retain backward compatibility for currently toasted data.

> 3) There is no API and such an approach would require implementing
> toast and detoast in every data type we want to be custom toasted, resulting
> in multiple files modification. Maybe we have to consider introducing such
> an API?

No. As I mentioned, we can retain the current toast mechanism for
current types that do not yet want to use these new toast APIs. If we
use one different varatt_1b_e tag for type-owned toast pointers, the
system will be opt-in for types, and for types that don't (yet) have
their own toast slicing design will keep using the old all-or-nothing
single-allocation data with the good old compress-then-slice
out-of-line toast storage.

> 4) 1 toast relation per regular relation. With an update mechanics this will
> be less limiting, but still a limiting factor because 1 entry in base table
> could have a lot of entries in the toast table. Are we doing something with
> this?

I don't think that is relevant to the topic of type-aware toasting
optimization. The toast storage relation growing too large is not
unique to jsonb- or bytea-typed columns, so I believe this is better
solved in a different thread. Ideas 

Re: RFC: Pluggable TOAST

2023-11-07 Thread Nikita Malakhov
Hi,

I've been thinking about Matthias' proposals for some time and have some
questions:

>So, in short, I don't think there is a need for a specific "Pluggable
>toast API" like the one in the patchset at [0] that can be loaded
>on-demand, but I think that updating our current TOAST system to a
>system for which types can provide support functions would likely be
>quite beneficial, for efficient extraction of data from composite
>values.

As I understand one of the reasons against Pluggable TOAST is that
differences
in plugged-in Toasters could result in incompatibility even in different
versions
of the same DB.

The importance of the correct TOAST update is out of question, feel like I
have
to prepare a patch for it. There are some questions though, I'd address them
later with a patch.

>Example support functions:

>/* TODO: bikeshedding on names, signatures, further support functions. */
>Datum typsup_roastsliceofbread(Datum ptr, int sizetarget, char cmethod)
>Datum typsup_unroastsliceofbread(Datum ptr)
>void typsup_releaseroastedsliceofbread(Datump ptr) /* in case of
>non-unitary in-memory datums */

I correctly understand that you mean extending PG_TYPE and type cache,
by adding a new function set for toasting/detoasting a value in addition to
in/out, etc?

I see several issues here:
1) We could benefit from knowledge of internals of data being toasted (i.e.
in case of JSON value with key-value structure) only when EXTERNAL
storage mode is set, otherwise value will be compressed before toasted.
So we have to keep both TOAST mechanics regarding the storage mode
being used. It's the same issue as in Pluggable TOAST. Is it OK?

2) TOAST pointer is very limited in means of data it keeps, we'd have to
extend it anyway and keep both for backwards compatibility;

3) There is no API and such an approach would require implementing
toast and detoast in every data type we want to be custom toasted, resulting
in multiple files modification. Maybe we have to consider introducing such
an API?

4) 1 toast relation per regular relation. With an update mechanics this will
be less limiting, but still a limiting factor because 1 entry in base table
could have a lot of entries in the toast table. Are we doing something with
this?

>We would probably want at least 2 more subtypes of varattrib_1b_e -
>one for on-disk pointers, and one for in-memory pointers - where the
>payload of those pointers is managed by the type's toast mechanism and
>considered opaque to the rest of PostgreSQL (and thus not compatible
>with the binary transfer protocol). Types are currently already
>expected to be able to handle their own binary representation, so
>allowing types to manage parts of the toast representation should IMHO
>not be too dangerous, though we should make sure that BINARY COERCIBLE
>types share this toast support routine, or be returned to their
>canonical binary version before they are cast to the coerced type, as
>using different detoasting mechanisms could result in corrupted data
>and thus crashes.

>Lastly, there is the compression part of TOAST. I think it should be
>relatively straightforward to expose the compression-related
>components of TOAST through functions that can then be used by
>type-specific toast support functions.
>Note that this would be opt-in for a type, thus all functions that use
>that type's internals should be aware of the different on-disk format
>for toasted values and should thus be able to handle it gracefully.

Thanks a lot for answers!

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: RFC: Pluggable TOAST

2023-10-26 Thread Nikita Malakhov
Hi!

Matthias, thank you for your patience and explanation. I'd wish I had it
much earlier, it would save a lot of time.
You've asked a lot of good questions, and the answers we have for some
seem to be not very satisfactory, and pointed out some topics that were not
mentioned before. I have to rethink our approach to the TOAST enhancements
according to it.

Thanks a lot!

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: RFC: Pluggable TOAST

2023-10-26 Thread Matthias van de Meent
On Thu, 26 Oct 2023 at 15:18, Aleksander Alekseev
 wrote:
>
> Hi,
>
> > And the goal of *THIS* topic is to gather a picture on how the community 
> > sees
> > improvements in TOAST mechanics if it doesn't want it the way we proposed
> > before, to understand which way to go with JSON advanced storage and other
> > enhancements we already have. Previous topic was not of any help here.
>
> Publish your code under an appropriate license first so that 1. anyone
> can test/benchmark it and 2. merge it to the PostgreSQL core if
> necessary.
>
> Or better consider participating in the [1] discussion where we
> reached a consensus on RFC and are working on improving TOAST for JSON
> and other types. We try to be mindful of use cases you named before
> like 64-bit TOAST pointers but we still could use your input.

I feel that the no. 2 proposal is significantly different from the
discussion over at [1] in that it concerns changes in the interface
between types and toast, as opposed to as opposed to the no. 1
proposal (and [1]'s) changes that stay mostly inside the current TOAST
apis and abstractions.

The "Compression dictionaries for JSONB" thread that you linked went
the way of "store and use compression dictionaries for TOAST
compression algorithms", which is at a lower level than one of the
other ideas, which was to "allow JSONB to use a dictionary of common
values to dictionary-encode some of the contained entries". Naive
compression of the Datum's bytes makes the compressed datum
unparseable without decompression, even when dictionaries are used to
decrease the compressed size, while a type's own compression
dictionary substitutions could allow it to maintain it's structure and
would thus allow for a lower memory and storage footprint of the
column's datums during query processing.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: RFC: Pluggable TOAST

2023-10-26 Thread Matthias van de Meent
On Tue, 24 Oct 2023 at 22:38, Nikita Malakhov  wrote:
>
> Hi hackers!
>
> We need community feedback on previously discussed topic [1].
> There are some long-live issues in Postgres related to the TOAST mechanics, 
> like [2].
> Some time ago we already proposed a set of patches with an API allowing to 
> plug in
> different TOAST implementations into a live database. The patch set 
> introduced a lot
> of code and was quite crude in some places, so after several implementations 
> we decided
> to try to implement it in the production environment for further check-up.
>
> The main idea behind pluggable TOAST is make it possible to easily plug in 
> and use different
> implementations of large values storage, preserving existing mechanics to 
> keep backward
> compatibilitну provide easy Postgres-way  give users alternative mechanics 
> for storing large
> column values in a more effective way - we already have custom and very 
> effective (up to tens
> and even hundreds of times faster) TOAST implementations for bytea and JSONb 
> data types.
>
> As we see it - Pluggable TOAST proposes
> 1) changes in TOAST pointer itself, extending it to store custom data - 
> current limitations
> of TOAST pointer were discussed in [1] and [4];
> 2) API which allows calls of custom TOAST implementations for certain table 
> columns and
> (a topic for discussion) certain datatypes.
>
> Custom TOAST could be also used in a not so trivial way - for example, 
> limited columnar storage could be easily implemented and plugged in without 
> heavy core modifications
> of implementation of Pluggable Storage (Table Access Methods), preserving 
> existing data
> and database structure, be upgraded, replicated and so on.
>
> Any thoughts and proposals are welcome.

TLDR of my thoughts below:
1. I don't see much value in the "Pluggable TOAST" as proposed in [0],
where toasters are both decoupled from the type but also strongly
bound to the type with tagged vtables.
2. I do think we should allow *types* to provide their own toast
slicing implementation (not just "one blob, compressed then sliced"),
so that structured types don't have to read MBs of data to access only
a few of the structure's bytes. As this would be a different way of
storing the data, that would likely use a different tag for the
varatt_1b_e struct to differentiate the two stored formats.
3. I do think that attributes shouldn't be required to be stored
either on disk or in a single palloc-ed area of memory. It is very
expensive to copy such large chunks of memory; jsonb is one such
example. If the type is composite, allow it to be allocated in
multiple regions. This would require a new varatt_1b_e tag to discern
that the Datum isn't necessarily located in a single memory context,
but with good memory context management that should be fine.
4. I do think that TOAST needs improvements to allow differential
updates, not just full rewrites of the value. I believe this would
likely be enabled through solutions for (2) and (3), even if it might
already be possible without implementing new vartag options.

My thoughts:

In my view, the main job of TOAST is:
- To make sure a row with large attributes can still fit on a page by
reducing the size of the representation of attributes in the row
- To allow us to efficiently handle variable-length attribute values
- To reduce the overhead of moving large values through query execution

This is currently implemented through tagged values that contain
exactly one canonical representation of the type (be it inline, inline
compressed, or out of line with or without compression).

Our current implementation assumes that users of the attribute will
always use either the decompressed canonical representation, or don't
care about the representation at all (except decompression of only
prefixes, which is a special case), but this is clearly not the case:
Composite values like ROW types clearly benefit from careful
partitioning and subdivision of values into self-contained compressed
chunks: We don't TOAST a table's rows, but do TOAST per attribute.
JSONB could also benefit if it could create its own on-disk format of
a value: benchmarks of the "Pluggable Toaster" patch have shown that
JSONB operation performance improved significantly with custom toaster
infrastructure.

So, if composite types (like JSONB, ROW and ARRAY) would be able to
manually slice their values and create their own representation of
that toasted value, then that would probably benefit the system by
allowing some data to be stored in a more accessible manner than
"everything inline, compressed, or out-of-line, detoast (a prefix of)
all data, or none of it, no partial detoasting".


Now, returning to the table-level TOAST task of making sure the
tuple's data fits on the page, compressing & out-of-line-ing the data
until it fits:

Things that it currently does: varlena values are compressed and
out-of-lined with generic compression algorithms and a naive
slice-and-dice 

Re: RFC: Pluggable TOAST

2023-10-26 Thread Aleksander Alekseev
Hi,

> And the goal of *THIS* topic is to gather a picture on how the community sees
> improvements in TOAST mechanics if it doesn't want it the way we proposed
> before, to understand which way to go with JSON advanced storage and other
> enhancements we already have. Previous topic was not of any help here.

Publish your code under an appropriate license first so that 1. anyone
can test/benchmark it and 2. merge it to the PostgreSQL core if
necessary.

Or better consider participating in the [1] discussion where we
reached a consensus on RFC and are working on improving TOAST for JSON
and other types. We try to be mindful of use cases you named before
like 64-bit TOAST pointers but we still could use your input.

You know all this.

[1]: 
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com
-- 
Best regards,
Aleksander Alekseev




Re: RFC: Pluggable TOAST

2023-10-26 Thread Nikita Malakhov
Hi,

I meant discussion preceding the patch set - there was no any.

And the goal of *THIS* topic is to gather a picture on how the community
sees
improvements in TOAST mechanics if it doesn't want it the way we proposed
before, to understand which way to go with JSON advanced storage and other
enhancements we already have. Previous topic was not of any help here.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: RFC: Pluggable TOAST

2023-10-26 Thread Aleksander Alekseev
Hi,

> Aleksander, previous discussion was not a discussion actually, we proposed
> a set of big and complex core changes without any discussion preceding it.
> That was not very good approach although the overall idea behind the patch
> set is very progressive and is ready to solve some old and painful issues in 
> Postgres.

Not true.

There *was* a discussion and you are aware of all the problems that
were pointed out. Most importantly [1][2]. Also you followed the
thread [3] and are well aware that we want to implement TOAST
improvements in PostgreSQL core.

Despite all this you are still insisting on the extendable design as
if starting a new thread every year or so will change something.

[1]: 
https://www.postgresql.org/message-id/20230205223313.4dwhlddzg6uhaztg%40alap3.anarazel.de
[2]: 
https://www.postgresql.org/message-id/CAJ7c6TOsHtGkup8AVnLTGGt-%2B7EzE2j-cFGr12U37pzGEsU6Fg%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com
-- 
Best regards,
Aleksander Alekseev




Re: RFC: Pluggable TOAST

2023-10-26 Thread Nikita Malakhov
Hi,

Aleksander, previous discussion was not a discussion actually, we proposed
a set of big and complex core changes without any discussion preceding it.
That was not very good approach although the overall idea behind the patch
set is very progressive and is ready to solve some old and painful issues
in Postgres.

Also, introduction of SQL/JSON will further boost usage of JSON in
databases,
so our improvements in JSON storage and performance would be very useful.
These improvements depend on Pluggable TOAST, without API that allows easy
plug-in different TOAST implementations they require heavy core
modifications
and are very unlikely to be accepted. Not to mention that such kind of
changes
require upgrades, restarts and so on.

Pluggable TOAST allows using advanced storage techniques on top of the
default
Postgres database engine, instead of implementing the complex Pluggable
Storage
API, and allows plugging these advanced techniques on the fly - without even
restarting the server, which is crucial for production systems.

Discussion on extending the TOAST pointer showed some interest in this
topic,
so I hope this feature would draw some attention in the scope of widely
used large
JSON objects.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: RFC: Pluggable TOAST

2023-10-25 Thread Aleksander Alekseev
Hi Nikita,

> We need community feedback on previously discussed topic [1].
> There are some long-live issues in Postgres related to the TOAST mechanics, 
> like [2].
> Some time ago we already proposed a set of patches with an API allowing to 
> plug in
> different TOAST implementations into a live database. The patch set 
> introduced a lot
> of code and was quite crude in some places, so after several implementations 
> we decided
> to try to implement it in the production environment for further check-up.
>
> The main idea behind pluggable TOAST is make it possible to easily plug in 
> and use different
> implementations of large values storage, preserving existing mechanics to 
> keep backward
> compatibilitну provide easy Postgres-way  give users alternative mechanics 
> for storing large
> column values in a more effective way - we already have custom and very 
> effective (up to tens
> and even hundreds of times faster) TOAST implementations for bytea and JSONb 
> data types.
>
> As we see it - Pluggable TOAST proposes
> 1) changes in TOAST pointer itself, extending it to store custom data - 
> current limitations
> of TOAST pointer were discussed in [1] and [4];
> 2) API which allows calls of custom TOAST implementations for certain table 
> columns and
> (a topic for discussion) certain datatypes.
>
> Custom TOAST could be also used in a not so trivial way - for example, 
> limited columnar storage could be easily implemented and plugged in without 
> heavy core modifications
> of implementation of Pluggable Storage (Table Access Methods), preserving 
> existing data
> and database structure, be upgraded, replicated and so on.
>
> Any thoughts and proposals are welcome.

It seems to me that discarding the previous discussion and starting a
new thread where you ask the community for *another* feedback is not
going to be productive. Pretty sure it's not going to change.

-- 
Best regards,
Aleksander Alekseev




RFC: Pluggable TOAST

2023-10-24 Thread Nikita Malakhov
Hi hackers!

We need community feedback on previously discussed topic [1].
There are some long-live issues in Postgres related to the TOAST mechanics,
like [2].
Some time ago we already proposed a set of patches with an API allowing to
plug in
different TOAST implementations into a live database. The patch set
introduced a lot
of code and was quite crude in some places, so after several
implementations we decided
to try to implement it in the production environment for further check-up.

The main idea behind pluggable TOAST is make it possible to easily plug in
and use different
implementations of large values storage, preserving existing mechanics to
keep backward
compatibilitну provide easy Postgres-way  give users alternative mechanics
for storing large
column values in a more effective way - we already have custom and very
effective (up to tens
and even hundreds of times faster) TOAST implementations for bytea and
JSONb data types.

As we see it - Pluggable TOAST proposes
1) changes in TOAST pointer itself, extending it to store custom data -
current limitations
of TOAST pointer were discussed in [1] and [4];
2) API which allows calls of custom TOAST implementations for certain table
columns and
(a topic for discussion) certain datatypes.

Custom TOAST could be also used in a not so trivial way - for example,
limited columnar storage could be easily implemented and plugged in without
heavy core modifications
of implementation of Pluggable Storage (Table Access Methods), preserving
existing data
and database structure, be upgraded, replicated and so on.

Any thoughts and proposals are welcome.

[1] Pluggable TOAST
https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru

[2] Infinite loop while acquiring new TOAST Oid
https://www.postgresql.org/message-id/flat/CAN-LCVPRvRzxeUdYdDCZ7UwZQs1NmZpqBUCd%3D%2BRdMPFTyt-bRQ%40mail.gmail.com

[3] JSONB Compression dictionaries
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

[4] Extending the TOAST pointer
https://www.postgresql.org/message-id/flat/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com
-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/