Hi hackers, This is a follow-up thread to `RFC: compression dictionaries for JSONB` [1]. I would like to share my current progress in order to get early feedback. The patch is currently in a draft state but implements the basic functionality. I did my best to account for all the great feedback I previously got from Alvaro and Matthias.
Usage example: ``` CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb'); SELECT '{"aaa":"bbb"}' :: mydict; mydict ---------------- {"aaa": "bbb"} SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa'; ?column? ---------- "bbb" ``` Here `mydict` works as a transparent replacement for `jsonb`. However, its internal representation differs. The provided dictionary entries ('aaa', 'bbb') are stored in the new catalog table: ``` SELECT * FROM pg_dict; oid | dicttypid | dictentry -------+-----------+----------- 39476 | 39475 | aaa 39477 | 39475 | bbb (2 rows) ``` When `mydict` sees 'aaa' in the document, it replaces it with the corresponding code, in this case - 39476. For more details regarding the compression algorithm and choosen compromises please see the comments in the patch. In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as TYPTYPE_BASE with only difference: corresponding `<type>_in` (pg_type.typinput) and `<another-type>_<type>` (pg_cast.castfunc) procedures receive the dictionary Oid as a `typmod` argument. This way the procedures can distinguish `mydict1` from `mydict2` and use the proper compression dictionary. The approach with alternative `typmod` role is arguably a bit hacky, but it was the less invasive way to implement the feature I've found. I'm open to alternative suggestions. Current limitations (todo): - ALTER TYPE is not implemented - Tests and documentation are missing - Autocomplete is missing Future work (out of scope of this patch): - Support types other than JSONB: TEXT, XML, etc - Automatically updated dictionaries, e.g. during VACUUM - Alternative compression algorithms. Note that this will not require any further changes in the catalog, only the values we write to pg_type and pg_cast will differ. Open questions: - Dictionary entries are currently stored as NameData, the same type that is used for enums. Are we OK with the accompanying limitations? Any alternative suggestions? - All in all, am I moving the right direction? Your feedback is very much welcomed! [1]: https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com -- Best regards, Aleksander Alekseev
v1-0001-CREATE-TYPE-foo-AS-DICTIONARY-OF-JSONB.patch
Description: Binary data