On 01/13/2018 04:22 PM, Arthur Zakirov wrote: > Hello, > > Thank you Tomas for your review. > > On Sat, Jan 13, 2018 at 03:25:55AM +0100, Tomas Vondra wrote: >> allocate memory in the buildCxt. What about adding tmpstrdup to copy a >> string into the context? I admit this is mostly nitpicking though. > > ... snip ...> >> 8) One more thing - I've noticed that the hash table uses this key: >> ... >> That is, full paths to the two files, and I'm not sure that's a very >> good idea. Firstly, it's a bit wasteful (1kB per path). But more >> importantly it means all dictionaries referencing the same files will >> share the same chunk of shared memory - not only within a single >> database, but across the whole cluster. That may lead to surprising >> behavior, because e.g. unloading a dictionary in one database will >> affect dictionaries in all other databases referencing the same files. > > Hm, indeed. It's worth to use only file names instead full paths. And > it is good idea to use more information besides file names. It can be > Oid of a database and Oid of a namespace maybe, because a > dictionary can be created in different schemas. >
I doubt using filenames (without the directory paths) solves anything, really. The keys still have to be MAXPGPATH because someone could create very long filename. But I don't think memory consumption is such a big deal, really. With 1000 dictionaries it's still just ~2MB of data, which is negligible compared to the amount of memory saved by sharing the dictionaries. Not sure if we really need to add the database/schema OIDs. I mentioned the unexpected consequences (cross-db sharing) but maybe that's a feature we should keep (it reduces memory usage). So perhaps this should be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the dictionary with other databases? Aren't we overengineering this? > I think your proposals may be implemented in several patches, so they can > be applyed independently but consistently. I suppose I will prepare new > version of the patch with fixes and with initial design of new functions > and commands soon. > Yes, splitting patches into smaller, more focused bits is a good idea. BTW the current patch fails to document the dictionary sharing. It only mentions it when describing the shared_dictionaries GUC. IMHO the right place for additional details is https://www.postgresql.org/docs/10/static/textsearch-dictionaries.html regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services