On Thu, 2 Jun 2022 at 14:30, Aleksander Alekseev <aleksan...@timescale.com> wrote:
> > I saw there was some previous discussion about dictionary size. It > > looks like this approach would put all dictionaries into a shared OID > > pool. Since I don't know what a "standard" use case is, is there any > > risk of OID exhaustion for larger deployments with many dictionaries? > > Or is 2**32 so comparatively large that it's not really a serious > > concern? > > I agree, this is a drawback of the current implementation. To be honest, > I simply followed the example of how ENUMs are implemented. I'm not 100% sure > if we should be worried here (apparently, freed OIDs are reused). I'm OK with > using a separate sequence if someone could second this. This is the first time > I'm altering the catalog so I'm not certain what the best practices are. The goal of this patch is great, thank you for working on this (and ZSON). The approach chosen has a few downsides that I'm not happy with yet. * Assigning OIDs for each dictionary entry is not a great idea. I don't see why you would need to do that; just assign monotonically ascending keys for each dictionary, as we do for AttrNums. * There is a limit on SQL statement size, which will effectively limit the size of dictionaries, but the examples are unrealistically small, so this isn't clear as a limitation, but it would be in practice. It would be better to specify a filename, which can be read in when the DDL executes. This can be put into pg_dump output in a similar way to the COPY data for a table is, so once read in it stays static. * The dictionaries are only allowed for certain datatypes. This should not be specifically limited by this patch, i.e. user defined types should not be rejected. * Dictionaries have no versioning. Any list of data items changes over time, so how do we express that? Enums were also invented as static lists originally, then had to be modified later to accomodate additions and revisions, so let's think about that now, even if we don't add all of the commands in one go. Currently we would have to create a whole new dictionary if even one word changes. Ideally, we want the dictionary to have a top-level name and then have multiple versions over time. Let's agree how we are going do these things, so we can make sure the design and code allows for those future enhancements. i.e. how will we do ALTER TABLE ... UPGRADE DICTIONARY without causing a table rewrite? * Does the order of entries in the dictionary allow us to express a priority? i.e. to allow Huffman coding. Thanks for your efforts - this is a very important patch. -- Simon Riggs http://www.EnterpriseDB.com/