Re: [HACKERS] GSOC - TOAST'ing in slices

Robert Haas Wed, 15 Mar 2017 04:17:51 -0700

On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
<[email protected]> wrote:
> The project’s idea is implement different slicing approaches according to
> the value’s datatype. For example a text field could be split upon character
> boundaries while a JSON document would be split in a way that allows fast
> access to it’s keys or values.


Hmm.  So if you had a long text field containing multibyte characters,
and you split it after, say, every 1024 characters rather than after
every N bytes, then you could do substr() without detoasting the whole
field.  On the other hand, my guess is that you'd waste a fair amount
of space in the TOAST table, because it's unlikely that the chunks
would be exactly the right size to fill every page of the table
completely.  On balance it seems like you'd be worse off, because
substr() probably isn't all that common an operation.

Now, in contrast, slicing JSON is a very common operation, so a
smarter slicing scheme might well pay off, but the question is - what
kind of a splitting method would actually allow fast access to the
keys or values?  It strikes me that this might be a difficult problem.
Tabula raza, you could design a serialization format that was aware
that it might get toasted and was constructed in such a way that as to
contain boundaries that are actually referenced from within the
format, so that, say, after reading the toplevel keys and values, you
could know that you next need chunk #103.  But unless the existing
jsonb binary format was designed with that in mind, it doesn't seem
likely to end up being true just by chance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC - TOAST'ing in slices

Reply via email to