On 18/03/16 19:19, Anastasia Lubennikova wrote:
Please, find the new version of the patch attached. Now it has WAL
functionality.

Detailed description of the feature you can find in README draft
https://goo.gl/50O8Q0

This patch is pretty complicated, so I ask everyone, who interested in
this feature,
to help with reviewing and testing it. I will be grateful for any feedback.
But please, don't complain about code style, it is still work in progress.

Next things I'm going to do:
1. More debugging and testing. I'm going to attach in next message
couple of sql scripts for testing.
2. Fix NULLs processing
3. Add a flag into pg_index, that allows to enable/disable compression
for each particular index.
4. Recheck locking considerations. I tried to write code as less
invasive as possible, but we need to make sure that algorithm is still
correct.
5. Change BTMaxItemSize
6. Bring back microvacuum functionality.

I think we should pack the TIDs more tightly, like GIN does with the varbyte encoding. It's tempting to commit this without it for now, and add the compression later, but I'd like to avoid having to deal with multiple binary-format upgrades, so let's figure out the final on-disk format that we want, right from the beginning.

It would be nice to reuse the varbyte encoding code from GIN, but we might not want to use that exact scheme for B-tree. Firstly, an important criteria when we designed GIN's encoding scheme was to avoid expanding on-disk size for any data set, which meant that a TID had to always be encoded in 6 bytes or less. We don't have that limitation with B-tree, because in B-tree, each item is currently stored as a separate IndexTuple, which is much larger. So we are free to choose an encoding scheme that's better at packing some values, at the expense of using more bytes for other values, if we want to. Some analysis on what we want would be nice. (It's still important that removing a TID from the list never makes the list larger, for VACUUM.)

Secondly, to be able to just always enable this feature, without a GUC or reloption, we might need something that's faster for random access than GIN's posting lists. Or can we just add the setting, but it would be nice to have some more analysis on the worst-case performance before we decide on that.

I find the macros in nbtree.h in the patch quite confusing. They're similar to what we did in GIN, but again we might want to choose differently here. So some discussion on the desired IndexTuple layout is in order. (One clear bug is that using the high bit of BlockNumber for the BT_POSTING flag will fail for a table larger than 2^31 blocks.)

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to