On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou <gpapadro...@gmail.com> wrote: > The project’s idea is implement different slicing approaches according to > the value’s datatype. For example a text field could be split upon character > boundaries while a JSON document would be split in a way that allows fast > access to it’s keys or values.
Hmm. So if you had a long text field containing multibyte characters, and you split it after, say, every 1024 characters rather than after every N bytes, then you could do substr() without detoasting the whole field. On the other hand, my guess is that you'd waste a fair amount of space in the TOAST table, because it's unlikely that the chunks would be exactly the right size to fill every page of the table completely. On balance it seems like you'd be worse off, because substr() probably isn't all that common an operation. Now, in contrast, slicing JSON is a very common operation, so a smarter slicing scheme might well pay off, but the question is - what kind of a splitting method would actually allow fast access to the keys or values? It strikes me that this might be a difficult problem. Tabula raza, you could design a serialization format that was aware that it might get toasted and was constructed in such a way that as to contain boundaries that are actually referenced from within the format, so that, say, after reading the toplevel keys and values, you could know that you next need chunk #103. But unless the existing jsonb binary format was designed with that in mind, it doesn't seem likely to end up being true just by chance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers