Re: [PROPOSAL] Shared Ispell dictionaries
On Fri, Apr 5, 2019 at 8:41 PM Alvaro Herrera wrote: > > Is 0001 a bugfix? Yep, it is rather a bugfix and can be applied independently. The fix allocates temporary strings using temporary context Conf->buildCxt. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Is 0001 a bugfix? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On 25.02.2019 14:33, Arthur Zakirov wrote: It seems to me Tom and Andres also vote for the mmap() approach. I think I need to look closely at the mmap(). I've labeled the patch as 'v13'. Unfortunately I didn't come up with a new patch yet. So I marked the entry as "Returned with feedback" for now. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 21.02.2019 19:13, Robert Haas wrote: So I think it's better to have each backend locally make a decision about when that particular backend no longer needs the dictionary, and then let the system automatically clean up the ones that are needed by nobody. Yep, it wouldn't be hard to implement. Perhaps a better approach still would be to do what Andres proposed back in March: #> Is there any chance we can instead can convert dictionaries into a form #> we can just mmap() into memory? That'd scale a lot higher and more #> dynamicallly? The current approach inherently involves double-buffering: you've got the filesystem cache containing the data read from disk, and then the DSM containing the converted form of the data. Having something that you could just mmap() would avoid that, plus it would become a lot less critical to keep the mappings around. You could probably just have individual queries mmap() it for as long as they need it and then tear out the mapping when they finish executing; keeping the mappings across queries likely wouldn't be too important in this case. The downside is that you'd probably need to teach resowner.c about mappings created via mmap() so that you don't leak mappings on an abort, but that's probably not a crazy difficult problem. It seems to me Tom and Andres also vote for the mmap() approach. I think I need to look closely at the mmap(). I've labeled the patch as 'v13'. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On February 21, 2019 10:08:00 AM PST, Tom Lane wrote: >Robert Haas writes: >> Perhaps a better approach still would be to do what Andres proposed >> back in March: > >> #> Is there any chance we can instead can convert dictionaries into a >form >> #> we can just mmap() into memory? That'd scale a lot higher and >more >> #> dynamicallly? > >That seems awfully attractive. I was about to question whether we >could >assume that mmap() works everywhere, but it's required by SUSv2 ... and >if anybody has anything sufficiently lame for it not to work, we could >fall back on malloc-a-hunk-of-memory-and-read-in-the-file. > >We'd need a bunch of work to design a position-independent binary >representation for dictionaries, and then some tool to produce disk >files >containing that, so this isn't exactly a quick route to a solution. >On the other hand, it isn't sounding like the current patch is getting >close to committable either. > >(Actually, I guess you need a PI representation of a dictionary to >put it in a DSM either, so presumably that part of the work is >done already; although we might also wish for architecture independence >of the disk files, which we probably don't have right now.) That's what I was pushing for ages ago... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PROPOSAL] Shared Ispell dictionaries
Robert Haas writes: > Perhaps a better approach still would be to do what Andres proposed > back in March: > #> Is there any chance we can instead can convert dictionaries into a form > #> we can just mmap() into memory? That'd scale a lot higher and more > #> dynamicallly? That seems awfully attractive. I was about to question whether we could assume that mmap() works everywhere, but it's required by SUSv2 ... and if anybody has anything sufficiently lame for it not to work, we could fall back on malloc-a-hunk-of-memory-and-read-in-the-file. We'd need a bunch of work to design a position-independent binary representation for dictionaries, and then some tool to produce disk files containing that, so this isn't exactly a quick route to a solution. On the other hand, it isn't sounding like the current patch is getting close to committable either. (Actually, I guess you need a PI representation of a dictionary to put it in a DSM either, so presumably that part of the work is done already; although we might also wish for architecture independence of the disk files, which we probably don't have right now.) regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, Feb 21, 2019 at 8:28 AM Arthur Zakirov wrote: > Your approach looks simpler. It is necessary just to periodically scan > dictionaries' cache hash table and not call dsm_pin_segment() when a DSM > segment initialized. It also means that a dictionary is loaded into DSM > only while there is a backend which attached the dictionary's DSM. Right. I think that having a central facility that tries to decide whether or not a dictionary should be kept in shared memory or not, e.g. based on a cache size parameter, isn't likely to work well. The problem is that if we make a decision that a dictionary should be evicted because it's causing us to exceed the cache size threshold, then we have no way to implement that decision. We can't force other backends to remove the mapping immediately, nor can we really bound the time before they respond to a request to unmap it. They might be in the middle of using it. So I think it's better to have each backend locally make a decision about when that particular backend no longer needs the dictionary, and then let the system automatically clean up the ones that are needed by nobody. Perhaps a better approach still would be to do what Andres proposed back in March: #> Is there any chance we can instead can convert dictionaries into a form #> we can just mmap() into memory? That'd scale a lot higher and more #> dynamicallly? The current approach inherently involves double-buffering: you've got the filesystem cache containing the data read from disk, and then the DSM containing the converted form of the data. Having something that you could just mmap() would avoid that, plus it would become a lot less critical to keep the mappings around. You could probably just have individual queries mmap() it for as long as they need it and then tear out the mapping when they finish executing; keeping the mappings across queries likely wouldn't be too important in this case. The downside is that you'd probably need to teach resowner.c about mappings created via mmap() so that you don't leak mappings on an abort, but that's probably not a crazy difficult problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 21.02.2019 15:45, Robert Haas wrote: On Wed, Feb 20, 2019 at 9:33 AM Arthur Zakirov wrote: I'm working on the (b) approach. I thought about a priority queue structure. There no such ready structure within PostgreSQL sources except binaryheap.c, but it isn't for concurrent algorithms. I don't see why you need a priority queue or, really, any other fancy data structure. It seems like all you need to do is somehow set it up so that a backend which doesn't use a dictionary for a while will dsm_detach() the segment. Eventually an unused dictionary will have no remaining references and will go away. Hm, I didn't think in this way. Agree that using a new data structure is overengineering. Now in the current patch all DSM segments are pinned (and therefore dsm_pin_segment() is called). So a dictionary lives in shared memory even if nobody have the reference to it. I thought about periodically scanning the shared hash table and unpinning old and unused dictionaries. But this approach needs sequential scan facility for dshash. Happily there is the patch from Kyotaro-san (the v16-0001-sequential-scan-for-dshash.patch part): https://www.postgresql.org/message-id/20190221.160555.191280262.horiguchi.kyot...@lab.ntt.co.jp Your approach looks simpler. It is necessary just to periodically scan dictionaries' cache hash table and not call dsm_pin_segment() when a DSM segment initialized. It also means that a dictionary is loaded into DSM only while there is a backend which attached the dictionary's DSM. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, Feb 20, 2019 at 9:33 AM Arthur Zakirov wrote: > I'm working on the (b) approach. I thought about a priority queue > structure. There no such ready structure within PostgreSQL sources > except binaryheap.c, but it isn't for concurrent algorithms. I don't see why you need a priority queue or, really, any other fancy data structure. It seems like all you need to do is somehow set it up so that a backend which doesn't use a dictionary for a while will dsm_detach() the segment. Eventually an unused dictionary will have no remaining references and will go away. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, I've created the new commitfest entry since the previous entry was closed with status "Returned with feedback": https://commitfest.postgresql.org/22/2007/ I attached new version of the patch. There are changes only in 0003-Retrieve-shared-location-for-dict-v18.patch. I added a reference counter to shared hash tables dictionary entries. It is necessary to not face memory bloat. It is necessary to delete shared hash table entries if there are a lot of ALTER and DROP TEXT SEARCH DICTIONARY. Previous version of the patch had released unused DSM segments but left shared hash table entries untouched. There was refcnt before: https://www.postgresql.org/message-id/20180403115720.GA7450%40zakirov.localdomain But I didn't fully understand how on_dsm_detach() works. On 22.01.2019 22:17, Tomas Vondra wrote: I think there are essentially two ways: (a) Define max amount of memory available for shared dictionarires, and come up with an eviction algorithm. This will be tricky, because when the frequently-used dictionaries need a bit more memory than the limit, this will result in trashing (evict+load over and over). (b) Define what "unused" means for dictionaries, and unload dictionaries that become unused. For example, we could track timestamp of the last time each dict was used, and decide that dictionaries unused for 5 or more minutes are unused. And evict those. The advantage of (b) is that it adopts automatically, more or less. When you have a bunch of frequently used dictionaries, the amount of shared memory increases. If you stop using them, it decreases after a while. And rarely used dicts won't force eviction of the frequently used ones. I'm working on the (b) approach. I thought about a priority queue structure. There no such ready structure within PostgreSQL sources except binaryheap.c, but it isn't for concurrent algorithms. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company >From 3e220e259eebc6b9730c9500176015b04e588cae Mon Sep 17 00:00:00 2001 From: Arthur Zakirov Date: Thu, 17 Jan 2019 14:27:32 +0300 Subject: [PATCH 1/4] Fix-ispell-memory-handling --- src/backend/tsearch/spell.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index eb39466b22..eb8416ce7f 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), -- 2.20.1 >From 291667b579641176ca74eaa343521dd5c258a744 Mon Sep 17 00:00:00 2001 From: Arthur Zakirov Date: Thu, 17 Jan 2019 15:05:44 +0300 Subject: [PATCH 2/4] Change-tmplinit-argument --- contrib/dict_int/dict_int.c | 4 +- contrib/dict_xsyn/dict_xsyn.c| 4 +- contrib/unaccent/unaccent.c | 4 +- src/backend/commands/tsearchcmds.c | 10 - src/backend/snowball/dict_snowball.c | 4 +- src/backend/tsearch/dict_ispell.c| 4 +- src/backend/tsearch/dict_simple.c| 4 +- src/backend/tsearch/dict_synonym.c | 4 +- src/backend/tsearch/dict_thesaurus.c | 4 +- src/backend/utils/cache/ts_cache.c | 13 +- src/include/tsearch/ts_cache.h | 4 ++ src/include/tsearch/ts_public.h | 67 ++-- 12 files changed, 105 insertions(+), 21 deletions(-) diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 628b9769c3..ddde55eee4 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git
Re: [PROPOSAL] Shared Ispell dictionaries
Hi, On 2019-02-01 09:40:44 -0500, Robert Haas wrote: > On Tue, Jan 22, 2019 at 2:17 PM Tomas Vondra > wrote: > > I think there are essentially two ways: > > > > (a) Define max amount of memory available for shared dictionarires, and > > come up with an eviction algorithm. This will be tricky, because when > > the frequently-used dictionaries need a bit more memory than the limit, > > this will result in trashing (evict+load over and over). > > > > (b) Define what "unused" means for dictionaries, and unload dictionaries > > that become unused. For example, we could track timestamp of the last > > time each dict was used, and decide that dictionaries unused for 5 or > > more minutes are unused. And evict those. > > > > The advantage of (b) is that it adopts automatically, more or less. When > > you have a bunch of frequently used dictionaries, the amount of shared > > memory increases. If you stop using them, it decreases after a while. > > And rarely used dicts won't force eviction of the frequently used ones. > > +1 for (b). This patch has been waiting on author for two weeks, the commitfest has ended, and there's substantial work needed. Therefore I'm marking the patch as returned with feedback. Please resubmit a new version, once the feedback has been addressed. Greetings, Andres Freund
Re: [PROPOSAL] Shared Ispell dictionaries
On Tue, Jan 22, 2019 at 2:17 PM Tomas Vondra wrote: > I think there are essentially two ways: > > (a) Define max amount of memory available for shared dictionarires, and > come up with an eviction algorithm. This will be tricky, because when > the frequently-used dictionaries need a bit more memory than the limit, > this will result in trashing (evict+load over and over). > > (b) Define what "unused" means for dictionaries, and unload dictionaries > that become unused. For example, we could track timestamp of the last > time each dict was used, and decide that dictionaries unused for 5 or > more minutes are unused. And evict those. > > The advantage of (b) is that it adopts automatically, more or less. When > you have a bunch of frequently used dictionaries, the amount of shared > memory increases. If you stop using them, it decreases after a while. > And rarely used dicts won't force eviction of the frequently used ones. +1 for (b). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 01.02.2019 12:09, Arthur Zakirov wrote: Thanks for sharing your ideas, Tomas. Unfortunately I won't manage to develop new version of the patch till the end of the commitfest due to lack of time. I'll think about the second approach. Tracking timestamp of the last time a dict was used may be difficult though and may slow down FTS... I move the path to the next commitfest. Oh, It seems it can't be moved to the next commitfest from status "Waiting on Author". -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 22.01.2019 22:17, Tomas Vondra wrote: On 1/22/19 7:36 PM, Arthur Zakirov wrote: max_shared_dictionaries_size can be renamed to shared_dictionaries_cleanup_threshold. That really depends on what exactly the threshold does. If it only triggers cleanup but does not enforce maximum amount of memory used by dictionaries, then this name seems OK. If it ensures max amount of memory, the max_..._size name would be better. Yep, I thought about the first approach. I think there are essentially two ways: (a) Define max amount of memory available for shared dictionarires, and come up with an eviction algorithm. This will be tricky, because when the frequently-used dictionaries need a bit more memory than the limit, this will result in trashing (evict+load over and over). (b) Define what "unused" means for dictionaries, and unload dictionaries that become unused. For example, we could track timestamp of the last time each dict was used, and decide that dictionaries unused for 5 or more minutes are unused. And evict those. The advantage of (b) is that it adopts automatically, more or less. When you have a bunch of frequently used dictionaries, the amount of shared memory increases. If you stop using them, it decreases after a while. And rarely used dicts won't force eviction of the frequently used ones. Thanks for sharing your ideas, Tomas. Unfortunately I won't manage to develop new version of the patch till the end of the commitfest due to lack of time. I'll think about the second approach. Tracking timestamp of the last time a dict was used may be difficult though and may slow down FTS... I move the path to the next commitfest. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 1/22/19 7:36 PM, Arthur Zakirov wrote: > пн, 21 янв. 2019 г. в 19:42, Arthur Zakirov : >> >> On 21.01.2019 17:56, Tomas Vondra wrote: >>> I wonder if we could devise some simple cache eviction policy. We don't >>> have any memory limit GUC anymore, but maybe we could use unload >>> dictionaries that were unused for sufficient amount of time (a couple of >>> minutes or so). Of course, the question is when exactly would it happen >>> (it seems far too expensive to invoke on each dict access, and it should >>> happen even when the dicts are not accessed at all). >> >> Yes, I thought about such feature too. Agree, it could be expensive >> since we need to scan pg_ts_dict table to get list of dictionaries (we >> can't scan dshash_table). >> >> I haven't a good solution yet. I just had a thought to return >> max_shared_dictionaries_size. Then we can unload dictionaries (and scan >> the pg_ts_dict table) that were accessed a lot time ago if we reached >> the size limit. >> We can't set exact size limit since we can't release the memory >> immediately. So max_shared_dictionaries_size can be renamed to >> shared_dictionaries_threshold. If it is equal to "0" then PostgreSQL has >> unlimited space for dictionaries. > > I want to propose to clean up segments during vacuum/autovacuum. I'm not > aware of the politics of cleaning up objects besides relations during > vacuum/autovacuum. Could be it a good idea? > I doubt that's a good idea, for a couple of reasons. For example, would it be bound to autovacuum on a particular object or would it happen as part of each vacuum run? If the dict cleanup happens only when vacuuming a particular object, then which one? If it happens on each autovacuum run, then that may easily be far too frequent (it essentially makes the cases with too frequent autovacuum runs even worse). But also what happens when there only minimal write activity and thus no regular autovacuum runs? Surely we should still do the dict cleanup. > Vacuum might unload dictionaries when total size of loaded dictionaries > exceeds a threshold. When it happens vacuum scans loaded dictionaries and > unloads (unpins segments and removes hash table entries) those dictionaries > which isn't mapped to any backend process (it happens because > dsm_pin_segment() is called) anymore. > Then why to bound that to autovacuum at all? Why not just make it part of loading the dictionary? > max_shared_dictionaries_size can be renamed to > shared_dictionaries_cleanup_threshold. > That really depends on what exactly the threshold does. If it only triggers cleanup but does not enforce maximum amount of memory used by dictionaries, then this name seems OK. If it ensures max amount of memory, the max_..._size name would be better. I think there are essentially two ways: (a) Define max amount of memory available for shared dictionarires, and come up with an eviction algorithm. This will be tricky, because when the frequently-used dictionaries need a bit more memory than the limit, this will result in trashing (evict+load over and over). (b) Define what "unused" means for dictionaries, and unload dictionaries that become unused. For example, we could track timestamp of the last time each dict was used, and decide that dictionaries unused for 5 or more minutes are unused. And evict those. The advantage of (b) is that it adopts automatically, more or less. When you have a bunch of frequently used dictionaries, the amount of shared memory increases. If you stop using them, it decreases after a while. And rarely used dicts won't force eviction of the frequently used ones. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
пн, 21 янв. 2019 г. в 19:42, Arthur Zakirov : > > On 21.01.2019 17:56, Tomas Vondra wrote: > > I wonder if we could devise some simple cache eviction policy. We don't > > have any memory limit GUC anymore, but maybe we could use unload > > dictionaries that were unused for sufficient amount of time (a couple of > > minutes or so). Of course, the question is when exactly would it happen > > (it seems far too expensive to invoke on each dict access, and it should > > happen even when the dicts are not accessed at all). > > Yes, I thought about such feature too. Agree, it could be expensive > since we need to scan pg_ts_dict table to get list of dictionaries (we > can't scan dshash_table). > > I haven't a good solution yet. I just had a thought to return > max_shared_dictionaries_size. Then we can unload dictionaries (and scan > the pg_ts_dict table) that were accessed a lot time ago if we reached > the size limit. > We can't set exact size limit since we can't release the memory > immediately. So max_shared_dictionaries_size can be renamed to > shared_dictionaries_threshold. If it is equal to "0" then PostgreSQL has > unlimited space for dictionaries. I want to propose to clean up segments during vacuum/autovacuum. I'm not aware of the politics of cleaning up objects besides relations during vacuum/autovacuum. Could be it a good idea? Vacuum might unload dictionaries when total size of loaded dictionaries exceeds a threshold. When it happens vacuum scans loaded dictionaries and unloads (unpins segments and removes hash table entries) those dictionaries which isn't mapped to any backend process (it happens because dsm_pin_segment() is called) anymore. max_shared_dictionaries_size can be renamed to shared_dictionaries_cleanup_threshold. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 21.01.2019 17:56, Tomas Vondra wrote: On 1/21/19 12:51 PM, Arthur Zakirov wrote: I'll try to implement the syntax, you suggested earlier: ALTER TEXT SEARCH DICTIONARY x UNLOAD/RELOAD The main point here is that UNLOAD/RELOAD can't release the memory immediately, because some other backend may pin a DSM. The second point we should consider (I think) - how do we know which dictionary should be unloaded. There was such function earlier, which was removed. But what about adding an information in the "\dFd" psql's command output? It could be a column which shows is a dictionary loaded. ...The only thing we have is "unload" capability by closing the connection... BTW, even if the connection was closed and there are no other connections a dictionary still remains "loaded". It is because dsm_pin_segment() is called during loading the dictionary into DSM. ... I wonder if we could devise some simple cache eviction policy. We don't have any memory limit GUC anymore, but maybe we could use unload dictionaries that were unused for sufficient amount of time (a couple of minutes or so). Of course, the question is when exactly would it happen (it seems far too expensive to invoke on each dict access, and it should happen even when the dicts are not accessed at all). Yes, I thought about such feature too. Agree, it could be expensive since we need to scan pg_ts_dict table to get list of dictionaries (we can't scan dshash_table). I haven't a good solution yet. I just had a thought to return max_shared_dictionaries_size. Then we can unload dictionaries (and scan the pg_ts_dict table) that were accessed a lot time ago if we reached the size limit. We can't set exact size limit since we can't release the memory immediately. So max_shared_dictionaries_size can be renamed to shared_dictionaries_threshold. If it is equal to "0" then PostgreSQL has unlimited space for dictionaries. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 1/21/19 12:51 PM, Arthur Zakirov wrote: > On 21.01.2019 02:43, Tomas Vondra wrote: >> On 1/20/19 11:21 PM, Andres Freund wrote: >>> On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote: Thanks. I've reviewed v17 today and I haven't discovered any new issues so far. If everything goes fine and no one protests, I plan to get it committed over the next week or so. >>> >>> There doesn't seem to be any docs about what's needed to be able to take >>> advantage of shared dicts, and how to prevent them from permanently >>> taking up a significant share of memory. >>> >> >> Yeah, those are good points. I agree the comments might be clearer, but >> essentially ispell dictionaries are shared and everything else is not. >> >> As for the memory consumption / unloading dicts - I agree that's >> something we need to address. There used to be a way to specify memory >> limit and ability to unload dictionaries explicitly, but both features >> have been ditched. The assumption was that UNLOAD would be introduced >> later, but that does not seem to have happened. > > I'll try to implement the syntax, you suggested earlier: > > ALTER TEXT SEARCH DICTIONARY x UNLOAD/RELOAD > > The main point here is that UNLOAD/RELOAD can't release the memory > immediately, because some other backend may pin a DSM. > > The second point we should consider (I think) - how do we know which > dictionary should be unloaded. There was such function earlier, which > was removed. But what about adding an information in the "\dFd" psql's > command output? It could be a column which shows is a dictionary loaded. > The UNLOAD capability is probably a good start, but it's entirely manual and I wonder if it's putting too much burden on the user. I mean, the user has to realize the dictionaries are using a lot of shared memory, has to decide which to unload, and then has to do UNLOAD on it. That's not quite straightforward, especially if there's no way to determine which dictionaries are currently loaded and how much memory they use :-( Of course, the problem is not exactly new - we don't show dictionaries already loaded into private memory. The only thing we have is "unload" capability by closing the connection. OTOH the memory consumption should be much lower thanks to using shared memory. So I think the patch is an improvement even in this regard. I wonder if we could devise some simple cache eviction policy. We don't have any memory limit GUC anymore, but maybe we could use unload dictionaries that were unused for sufficient amount of time (a couple of minutes or so). Of course, the question is when exactly would it happen (it seems far too expensive to invoke on each dict access, and it should happen even when the dicts are not accessed at all). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On 21.01.2019 02:43, Tomas Vondra wrote: On 1/20/19 11:21 PM, Andres Freund wrote: On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote: Thanks. I've reviewed v17 today and I haven't discovered any new issues so far. If everything goes fine and no one protests, I plan to get it committed over the next week or so. There doesn't seem to be any docs about what's needed to be able to take advantage of shared dicts, and how to prevent them from permanently taking up a significant share of memory. Yeah, those are good points. I agree the comments might be clearer, but essentially ispell dictionaries are shared and everything else is not. As for the memory consumption / unloading dicts - I agree that's something we need to address. There used to be a way to specify memory limit and ability to unload dictionaries explicitly, but both features have been ditched. The assumption was that UNLOAD would be introduced later, but that does not seem to have happened. I'll try to implement the syntax, you suggested earlier: ALTER TEXT SEARCH DICTIONARY x UNLOAD/RELOAD The main point here is that UNLOAD/RELOAD can't release the memory immediately, because some other backend may pin a DSM. The second point we should consider (I think) - how do we know which dictionary should be unloaded. There was such function earlier, which was removed. But what about adding an information in the "\dFd" psql's command output? It could be a column which shows is a dictionary loaded. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 1/20/19 11:21 PM, Andres Freund wrote: > On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote: >> On 1/17/19 3:15 PM, Arthur Zakirov wrote: >>> I attached files of new version of the patch, I applied your tweaks. >>> XXX All dictionaries, but only when there's invalid dictionary? >>> >>> I've made a little optimization. I introduced hashvalue into >>> TSDictionaryCacheEntry. Now released only DSM of altered or dropped >>> dictionaries. >>> > /* XXX not really a pointer, so the name is misleading */ I think we don't need DictPointerData struct anymore, because only ts_dict_shmem_release function needs it (see comments above) and we only need it to hash search. I'll move all fields of DictPointerData to TsearchDictKey struct. >>> >>> I was wrong, DictInitData also needs DictPointerData. I didn't remove >>> DictPointerData, I renamed it to DictEntryData. Hope that it is a more >>> appropriate name. >>> >> >> Thanks. I've reviewed v17 today and I haven't discovered any new issues >> so far. If everything goes fine and no one protests, I plan to get it >> committed over the next week or so. > > There doesn't seem to be any docs about what's needed to be able to take > advantage of shared dicts, and how to prevent them from permanently > taking up a significant share of memory. > Yeah, those are good points. I agree the comments might be clearer, but essentially ispell dictionaries are shared and everything else is not. As for the memory consumption / unloading dicts - I agree that's something we need to address. There used to be a way to specify memory limit and ability to unload dictionaries explicitly, but both features have been ditched. The assumption was that UNLOAD would be introduced later, but that does not seem to have happened. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote: > On 1/17/19 3:15 PM, Arthur Zakirov wrote: > > I attached files of new version of the patch, I applied your tweaks. > > > >> XXX All dictionaries, but only when there's invalid dictionary? > > > > I've made a little optimization. I introduced hashvalue into > > TSDictionaryCacheEntry. Now released only DSM of altered or dropped > > dictionaries. > > > >> > /* XXX not really a pointer, so the name is misleading */ > >> > >> I think we don't need DictPointerData struct anymore, because only > >> ts_dict_shmem_release function needs it (see comments above) and we only > >> need it to hash search. I'll move all fields of DictPointerData to > >> TsearchDictKey struct. > > > > I was wrong, DictInitData also needs DictPointerData. I didn't remove > > DictPointerData, I renamed it to DictEntryData. Hope that it is a more > > appropriate name. > > > > Thanks. I've reviewed v17 today and I haven't discovered any new issues > so far. If everything goes fine and no one protests, I plan to get it > committed over the next week or so. There doesn't seem to be any docs about what's needed to be able to take advantage of shared dicts, and how to prevent them from permanently taking up a significant share of memory. Greetings, Andres Freund
Re: [PROPOSAL] Shared Ispell dictionaries
On 1/17/19 3:15 PM, Arthur Zakirov wrote: > I attached files of new version of the patch, I applied your tweaks. > >> XXX All dictionaries, but only when there's invalid dictionary? > > I've made a little optimization. I introduced hashvalue into > TSDictionaryCacheEntry. Now released only DSM of altered or dropped > dictionaries. > >> > /* XXX not really a pointer, so the name is misleading */ >> >> I think we don't need DictPointerData struct anymore, because only >> ts_dict_shmem_release function needs it (see comments above) and we only >> need it to hash search. I'll move all fields of DictPointerData to >> TsearchDictKey struct. > > I was wrong, DictInitData also needs DictPointerData. I didn't remove > DictPointerData, I renamed it to DictEntryData. Hope that it is a more > appropriate name. > Thanks. I've reviewed v17 today and I haven't discovered any new issues so far. If everything goes fine and no one protests, I plan to get it committed over the next week or so. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
I attached files of new version of the patch, I applied your tweaks. > XXX All dictionaries, but only when there's invalid dictionary? I've made a little optimization. I introduced hashvalue into TSDictionaryCacheEntry. Now released only DSM of altered or dropped dictionaries. > > /* XXX not really a pointer, so the name is misleading */ > > I think we don't need DictPointerData struct anymore, because only > ts_dict_shmem_release function needs it (see comments above) and we only > need it to hash search. I'll move all fields of DictPointerData to > TsearchDictKey struct. I was wrong, DictInitData also needs DictPointerData. I didn't remove DictPointerData, I renamed it to DictEntryData. Hope that it is a more appropriate name. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company >From c20c171c2107efc6f87b688af0feecf2f98fcd69 Mon Sep 17 00:00:00 2001 From: Arthur Zakirov Date: Thu, 17 Jan 2019 14:27:32 +0300 Subject: [PATCH 1/4] Fix-ispell-memory-handling --- src/backend/tsearch/spell.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index eb39466b22..eb8416ce7f 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), -- 2.20.1 >From ca45e4ca314bdf8bed1a47796afb3e86c6fcd684 Mon Sep 17 00:00:00 2001 From: Arthur Zakirov Date: Thu, 17 Jan 2019 15:05:44 +0300 Subject: [PATCH 2/4] Change-tmplinit-argument --- contrib/dict_int/dict_int.c | 4 +- contrib/dict_xsyn/dict_xsyn.c| 4 +- contrib/unaccent/unaccent.c | 4 +- src/backend/commands/tsearchcmds.c | 10 - src/backend/snowball/dict_snowball.c | 4 +- src/backend/tsearch/dict_ispell.c| 4 +- src/backend/tsearch/dict_simple.c| 4 +- src/backend/tsearch/dict_synonym.c | 4 +- src/backend/tsearch/dict_thesaurus.c | 4 +- src/backend/utils/cache/ts_cache.c | 13 +- src/include/tsearch/ts_cache.h | 4 ++ src/include/tsearch/ts_public.h | 67 ++-- 12 files changed, 105 insertions(+), 21 deletions(-) diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 628b9769c3..ddde55eee4 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index 509e14aee0..15b1a0033a 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index fc5176e338..f3663cefd0 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -270,12 +270,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; bool fileloaded
Re: [PROPOSAL] Shared Ispell dictionaries
Hello Tomas, On 16.01.2019 03:23, Tomas Vondra wrote: I've looked at the patch today, and in general is seems quite solid to me. I do have a couple of minor points 1) I think the comments need more work. Instead of describing all the individual changes here, I've outlined those improvements in attached patches (see the attached "tweaks" patches). Some of it is formatting, minor rewording or larger changes. Some comments are rather redundant (e.g. the one before calls to release the DSM segment). Thank you! 2) It's not quite clear to me why we need DictInitData, which simply combines DictPointerData and list of options. It seems as if the only point is to pass a single parameter to the init function, but is it worth it? Why not to get rid of DictInitData entirely and pass two parameters instead? In the first place init method had two parameters. But in the v7 patch I added DictInitData struct instead of two parameters (list of options and DictPointerData): https://www.postgresql.org/message-id/20180319110648.GA32319%40zakirov.localdomain I haven't way to replace template's init method from init_method(internal) to init_method(internal,internal) in the upgrade script of extensions. If I'm not mistaken we need new syntax here, like ALTER TEXT SEARCH TEMPLATE. Thoughts? 3) I find it a bit cumbersome that before each ts_dict_shmem_release call we construct a dummy DickPointerData value. Why not to pass individual parameters and construct the struct in the function? Agree, it may look too verbose. I'll change it. 4) The reference to max_shared_dictionaries_size is obsolete, because there's no such limit anymore. Yeah, I'll fix it. > /* XXX not really a pointer, so the name is misleading */ I think we don't need DictPointerData struct anymore, because only ts_dict_shmem_release function needs it (see comments above) and we only need it to hash search. I'll move all fields of DictPointerData to TsearchDictKey struct. > XXX "supported" is not the same as "all ispell dicts behave like that". I'll reword the sentence. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Hello Arthur, I've looked at the patch today, and in general is seems quite solid to me. I do have a couple of minor points 1) I think the comments need more work. Instead of describing all the individual changes here, I've outlined those improvements in attached patches (see the attached "tweaks" patches). Some of it is formatting, minor rewording or larger changes. Some comments are rather redundant (e.g. the one before calls to release the DSM segment). 2) It's not quite clear to me why we need DictInitData, which simply combines DictPointerData and list of options. It seems as if the only point is to pass a single parameter to the init function, but is it worth it? Why not to get rid of DictInitData entirely and pass two parameters instead? 3) I find it a bit cumbersome that before each ts_dict_shmem_release call we construct a dummy DickPointerData value. Why not to pass individual parameters and construct the struct in the function? 4) The reference to max_shared_dictionaries_size is obsolete, because there's no such limit anymore. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From e76d34bcb1a84127f9b4402f0147642d77505cc2 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 15 Jan 2019 22:16:35 +0100 Subject: [PATCH 1/7] Fix ispell memory handling --- src/backend/tsearch/spell.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index eb39466b22..eb8416ce7f 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), -- 2.17.2 >From dbb3cc3b7e7c560472cfa5efa77598f4992e0b70 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 15 Jan 2019 22:17:32 +0100 Subject: [PATCH 2/7] Change tmplinit argument --- contrib/dict_int/dict_int.c | 4 +- contrib/dict_xsyn/dict_xsyn.c| 4 +- contrib/unaccent/unaccent.c | 4 +- src/backend/commands/tsearchcmds.c | 10 +++- src/backend/snowball/dict_snowball.c | 4 +- src/backend/tsearch/dict_ispell.c| 4 +- src/backend/tsearch/dict_simple.c| 4 +- src/backend/tsearch/dict_synonym.c | 4 +- src/backend/tsearch/dict_thesaurus.c | 4 +- src/backend/utils/cache/ts_cache.c | 19 +++- src/include/tsearch/ts_cache.h | 4 ++ src/include/tsearch/ts_public.h | 69 ++-- 12 files changed, 113 insertions(+), 21 deletions(-) diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 628b9769c3..ddde55eee4 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index 509e14aee0..15b1a0033a 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index
Re: [PROPOSAL] Shared Ispell dictionaries
On 01.10.2018 12:22, Arthur Zakirov wrote: On Thu, Jun 14, 2018 at 11:40:17AM +0300, Arthur Zakirov wrote: I attached new version of the patch. The patch still applies to HEAD. I moved it to the next commitfest. Here is the rebased patch. I also updated copyright in ts_shared.h and ts_shared.c. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index eb39466b22..eb8416ce7f 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 628b9769c3..ddde55eee4 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index 509e14aee0..15b1a0033a 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index fc5176e338..f3663cefd0 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -270,12 +270,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; bool fileloaded = false; ListCell *l; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index cda21675f0..93a71adc5d 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -390,17 +390,25 @@ verify_dictoptions(Oid tmplId, List *dictoptions) } else { + DictInitData init_data; + /* * Copy the options just in case init method thinks it can scribble on * them ... */ dictoptions = copyObject(dictoptions); + init_data.dict_options = dictoptions; + init_data.dict.id = InvalidOid; + init_data.dict.xmin = InvalidTransactionId; + init_data.dict.xmax = InvalidTransactionId; + ItemPointerSetInvalid(_data.dict.tid); + /* * Call the init method and see if it complains. We don't worry about * it leaking memory, since our command will soon be over anyway. */ - (void) OidFunctionCall1(initmethod, PointerGetDatum(dictoptions)); + (void) OidFunctionCall1(initmethod, PointerGetDatum(_data)); } ReleaseSysCache(tup); diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c index 5166738310..f30f29865c 100644 --- a/src/backend/snowball/dict_snowball.c +++ b/src/backend/snowball/dict_snowball.c @@ -201,14 +201,14 @@ locate_stem_module(DictSnowball *d, const char *lang) Datum dsnowball_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data =
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, Jun 14, 2018 at 11:40:17AM +0300, Arthur Zakirov wrote: > I attached new version of the patch. The patch still applies to HEAD. I moved it to the next commitfest. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, May 16, 2018 at 02:36:33PM +0300, Arthur Zakirov wrote: > ... I attached the rebased patch. I attached new version of the patch. I found a bug when CompoundAffix, SuffixNodes, PrefixNodes, DictNodes of IspellDictData structure are empty. Now they have terminating entry and therefore they have at least one node entry. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index 6f5b635413..09297e384c 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..8dd4959028 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..0b8a32d459 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index 247c202755..2a2fbee5fa 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -267,12 +267,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; boolfileloaded = false; ListCell *l; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index 3a843512d1..3753e32b2c 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -386,17 +386,25 @@ verify_dictoptions(Oid tmplId, List *dictoptions) } else { + DictInitData init_data; + /* * Copy the options just in case init method thinks it can scribble on * them ... */ dictoptions = copyObject(dictoptions); + init_data.dict_options = dictoptions; + init_data.dict.id = InvalidOid; + init_data.dict.xmin = InvalidTransactionId; + init_data.dict.xmax = InvalidTransactionId; + ItemPointerSetInvalid(_data.dict.tid); + /* * Call the init method and see if it complains. We don't worry about * it leaking memory,
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, May 17, 2018 at 02:14:07PM -0400, Robert Haas wrote: > On Thu, May 17, 2018 at 1:52 PM, Tom Lanewrote: > > Do we actually need to worry about unmapping promptly on DROP TEXT > > DICTIONARY? It seems like the only downside of not doing that is that > > we'd leak some address space until process exit. If you were thrashing > > dictionaries at some unreasonable rate on a 32-bit host, you might > > eventually run some sessions out of address space; but that doesn't seem > > like a situation that's so common that we need fragile coding to avoid it. > > I'm not sure what the situation is here. I think this case may take place when you continuously create, drop a lot of dictionaries; different connections concurrently work with them and some of connection stops working with text search at some point and therefore pinned segments won't be unpinned. But I'm not sure is this real case. Text search configuration changes should be very infrequent (as it is written on in the InvalidateTSCacheCallBack commentary). -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, May 17, 2018 at 10:18:56AM -0400, Tom Lane wrote: > I think the point you've not addressed is that "syscache callback > occurred" does not equate to "object was dropped". Can the code > survive having this occur at any invalidation point? > (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.) Thank you for the idea of testing with CLOBBER_CACHE_ALWAYS. I built postgres with it and run regression tests. I tested both approaches. In first glance they passed the tests. There is no concurrent tests for text search feature with two and more connections. Maybe it would be useful to make such tests. I did it manually but it is better to have a script. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, May 17, 2018 at 1:52 PM, Tom Lanewrote: > Robert Haas writes: >> On Thu, May 17, 2018 at 10:18 AM, Tom Lane wrote: >>> I think the point you've not addressed is that "syscache callback >>> occurred" does not equate to "object was dropped". Can the code >>> survive having this occur at any invalidation point? >>> (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.) > >> Well, I'm not advocating for a lack of testing, and >> CLOBBER_CACHE_ALWAYS testing is a good idea. However, I suspect that >> calling dsm_detach() from a syscache callback should be fine. >> Obviously there will be trouble if the surrounding code is still using >> that mapping, but that would be a bug at some higher level, like using >> an object without locking it. > > No, you're clearly not getting the point. You could have an absolutely > airtight exclusive lock of any description whatsoever, and that would > provide no guarantee at all that you don't get a cache flush callback. > It's only a cache, not a catalog, and it can get flushed for any reason > or no reason. (That's why we have pin counts on catcache and relcache > entries, rather than assuming that locking the corresponding object is > enough.) So I think it's highly likely that unmapping in a syscache > callback is going to lead quickly to SIGSEGV. The only way it would not > is if we keep the shared dictionary mapped only in short straight-line > code segments that never do any other catalog accesses ... which seems > awkward, inefficient, and error-prone. Yeah, that's true, but again, you can work around that problem. A DSM mapping is fundamentally not that different from a backend-private memory allocation. If you can avoid freeing memory while you're referencing it -- as the catcache and the syscache clearly do -- you can avoid it here, too. > Do we actually need to worry about unmapping promptly on DROP TEXT > DICTIONARY? It seems like the only downside of not doing that is that > we'd leak some address space until process exit. If you were thrashing > dictionaries at some unreasonable rate on a 32-bit host, you might > eventually run some sessions out of address space; but that doesn't seem > like a situation that's so common that we need fragile coding to avoid it. I'm not sure what the situation is here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
Robert Haaswrites: > On Thu, May 17, 2018 at 10:18 AM, Tom Lane wrote: >> I think the point you've not addressed is that "syscache callback >> occurred" does not equate to "object was dropped". Can the code >> survive having this occur at any invalidation point? >> (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.) > Well, I'm not advocating for a lack of testing, and > CLOBBER_CACHE_ALWAYS testing is a good idea. However, I suspect that > calling dsm_detach() from a syscache callback should be fine. > Obviously there will be trouble if the surrounding code is still using > that mapping, but that would be a bug at some higher level, like using > an object without locking it. No, you're clearly not getting the point. You could have an absolutely airtight exclusive lock of any description whatsoever, and that would provide no guarantee at all that you don't get a cache flush callback. It's only a cache, not a catalog, and it can get flushed for any reason or no reason. (That's why we have pin counts on catcache and relcache entries, rather than assuming that locking the corresponding object is enough.) So I think it's highly likely that unmapping in a syscache callback is going to lead quickly to SIGSEGV. The only way it would not is if we keep the shared dictionary mapped only in short straight-line code segments that never do any other catalog accesses ... which seems awkward, inefficient, and error-prone. Do we actually need to worry about unmapping promptly on DROP TEXT DICTIONARY? It seems like the only downside of not doing that is that we'd leak some address space until process exit. If you were thrashing dictionaries at some unreasonable rate on a 32-bit host, you might eventually run some sessions out of address space; but that doesn't seem like a situation that's so common that we need fragile coding to avoid it. regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, May 17, 2018 at 10:18 AM, Tom Lanewrote: > Robert Haas writes: >> ... Assuming that we can >> convince ourselves that that much is OK, I don't see why using a >> syscache callback to help ensure that the mappings are blown away in >> an at-least-somewhat-timely fashion is worse than any other approach. > > I think the point you've not addressed is that "syscache callback > occurred" does not equate to "object was dropped". Can the code > survive having this occur at any invalidation point? > (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.) Well, I'm not advocating for a lack of testing, and CLOBBER_CACHE_ALWAYS testing is a good idea. However, I suspect that calling dsm_detach() from a syscache callback should be fine. Obviously there will be trouble if the surrounding code is still using that mapping, but that would be a bug at some higher level, like using an object without locking it. And there will be trouble if you register an on_dsm_detach callback that does something strange, but the ones that the core code installs (when you use shm_mq, for example) should be safe. And there will be trouble if you're not careful about memory contexts, because someplace you probably need to remember that you detached from that DSM so you don't try to do it again, and you'd better be sure you have the right context selected when updating your data structures. But it all seems pretty solvable. I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, May 17, 2018 at 09:57:59AM -0400, Robert Haas wrote: > I think you and Tom have misunderstood each other somehow. If you > look at CommitTransaction(), you will see a comment that says: Oh, I understood. You are right. > Also, there is no absolute prohibition on kernel calls in post-commit > cleanup, or in no-fail code in general. Thank you for the explanation! The current approach depends on syscache callbacks anyway. Backend 2 (from the example above) knows is it necessary to unpin segments after syscache callback was called. Tom pointed below that callbacks are occured in various events. So I think I should check the current approach too using CLOBBER_CACHE_ALWAYS. It could show some problems in the current patch. Then if everything is OK I think I'll check another approach (unmapping in TS syscache callback) using CLOBBER_CACHE_ALWAYS. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Robert Haaswrites: > ... Assuming that we can > convince ourselves that that much is OK, I don't see why using a > syscache callback to help ensure that the mappings are blown away in > an at-least-somewhat-timely fashion is worse than any other approach. I think the point you've not addressed is that "syscache callback occurred" does not equate to "object was dropped". Can the code survive having this occur at any invalidation point? (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.) regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, May 16, 2018 at 4:42 PM, Arthur Zakirovwrote: > I haven't deep knowledge about guts of invalidation callbacks. It seems > that there is problem with it. Tom pointed above: > >> > I'm not sure that I understood the second case correclty. Can cache >> > invalidation help in this case? I don't have confident knowledge of cache >> > invalidation. It seems to me that InvalidateTSCacheCallBack() should >> > release segment after commit. >> >> "Release after commit" sounds like a pretty dangerous design to me, >> because a release necessarily implies some kernel calls, which could >> fail. We can't afford to inject steps that might fail into post-commit >> cleanup (because it's too late to recover by failing the transaction). >> It'd be better to do cleanup while searching for a dictionary to use. > > But it is possible that I misunderstood his note. I think you and Tom have misunderstood each other somehow. If you look at CommitTransaction(), you will see a comment that says: * This is all post-commit cleanup. Note that if an error is raised here, * it's too late to abort the transaction. This should be just * noncritical resource releasing. Between that point and the end of that function, we shouldn't do anything that throws an error, because the transaction is already committed and it's too late to change our mind. But if session A drops an object, session B is not going to get a callback to InvalidateTSCacheCallBack at that point. It's going to happen sometime in the middle of the transaction, like when it next tries to lock a relation or something. So Tom's complaint is irrelevant in that scenario. Also, there is no absolute prohibition on kernel calls in post-commit cleanup, or in no-fail code in general. For example, the RESOURCE_RELEASE_AFTER_LOCKS phase of resowner cleanup calls FileClose(). That's actually completely alarming when you really think about it, because one of the documented return values for close() is EIO, which certainly represents a very dangerous kind of failure -- see nearby threads about fsync-safety. Transaction abort acquires and releases numerous LWLocks, which can result in kernel calls that could fail. We're OK with that because, in practice, it never happens. Unmapping a DSM segment is probably about as safe as acquiring and releasing an LWLock, maybe safer. On my MacBook, the only documented return value for munmap is EINVAL, and any such error would indicate a PostgreSQL bug (or a kernel bug, or a cosmic ray hit). I checked a Linux system; things there are less clear, because mmap and mumap share a single man page, and mmap can fail for all kinds of reasons. But very few of the listed error codes look like things that could legitimately happen during munmap. Also, if munmap did fail (or shmdt/shmctl if using System V shared memory), it would be reported as a WARNING, not an ERROR, so we'd still be sorta OK. I think the only real question here is whether it's safe, at a high level, to drop the object at time T0 and have various backends drop the mapping at unpredictable later times T1, T2, ... all greater than T0. Generally, one wants to remove all references to an object before the object itself, which in this case we can't. Assuming that we can convince ourselves that that much is OK, I don't see why using a syscache callback to help ensure that the mappings are blown away in an at-least-somewhat-timely fashion is worse than any other approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, May 16, 2018 at 09:33:46AM -0400, Robert Haas wrote: > > In sum, I think the problem is mostly solved. Backend 2 unpins the > > segment in next ts_lexize() call. But if backend 2 doesn't call > > ts_lexize() (or other TS function) anymore the segment will remain mapped. > > It is the only problem I see for now. > > Maybe you could use CacheRegisterSyscacheCallback to get a callback > when the backend notices that a DROP has occurred. Yes, it was the first approach. DSM segments was unpinned in InvalidateTSCacheCallBack() in that approach, which is registered using CacheRegisterSyscacheCallback(). I haven't deep knowledge about guts of invalidation callbacks. It seems that there is problem with it. Tom pointed above: > > I'm not sure that I understood the second case correclty. Can cache > > invalidation help in this case? I don't have confident knowledge of cache > > invalidation. It seems to me that InvalidateTSCacheCallBack() should > > release segment after commit. > > "Release after commit" sounds like a pretty dangerous design to me, > because a release necessarily implies some kernel calls, which could > fail. We can't afford to inject steps that might fail into post-commit > cleanup (because it's too late to recover by failing the transaction). > It'd be better to do cleanup while searching for a dictionary to use. But it is possible that I misunderstood his note. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, May 16, 2018 at 7:36 AM, Arthur Zakirovwrote: >> I don't quite understand the problem you're trying to solve here, but: >> >> 1. Unless dsm_pin_segment() is called, a DSM segment will >> automatically be removed when there are no remaining mappings. >> >> 2. Unless dsm_pin_mapping() is called, a DSM segment will be unmapped >> when the currently-in-scope resource owner is cleaned up, like at the >> end of the query. If it is called, then the mapping will stick around >> until the backend exits. > > I tried to solve the case when DSM segment remains mapped even a > dictionary was dropped. It may happen in the following situation: > > Backend 1: > > =# select ts_lexize('english_shared', 'test'); > -- The dictionary is loaded into DSM, the segment and the mapping is > pinned > ... > -- Call ts_lexize() from backend 2 below > =# drop text search dictionary english_shared; > -- The segment and the mapping is unpinned, see ts_dict_shmem_release() > > Backend 2: > > =# select ts_lexize('english_shared', 'test'); > -- The dictionary got from DSM, the mapping is pinned > ... > -- The dictionary was dropped by backend 1, but the mapping still is > pinned Yeah, there's really nothing we can do about that (except switch from processes to threads). There's no way for one process to force another process to unmap something. As you've observed, you can get it to be dropped eventually, but not immediately. > In sum, I think the problem is mostly solved. Backend 2 unpins the > segment in next ts_lexize() call. But if backend 2 doesn't call > ts_lexize() (or other TS function) anymore the segment will remain mapped. > It is the only problem I see for now. Maybe you could use CacheRegisterSyscacheCallback to get a callback when the backend notices that a DROP has occurred. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, On Tue, May 15, 2018 at 05:02:43PM -0400, Robert Haas wrote: > On Tue, Mar 27, 2018 at 8:19 AM, Arthur Zakirov >wrote: > > Yes, there is dsm_pin_mapping() for this. But it is necessary to keep a > > segment even if there are no attached processes. From 0003: > > > > + /* Remain attached until end of postmaster */ > > + dsm_pin_segment(seg); > > + /* Remain attached until end of session */ > > + dsm_pin_mapping(seg); > > I don't quite understand the problem you're trying to solve here, but: > > 1. Unless dsm_pin_segment() is called, a DSM segment will > automatically be removed when there are no remaining mappings. > > 2. Unless dsm_pin_mapping() is called, a DSM segment will be unmapped > when the currently-in-scope resource owner is cleaned up, like at the > end of the query. If it is called, then the mapping will stick around > until the backend exits. I tried to solve the case when DSM segment remains mapped even a dictionary was dropped. It may happen in the following situation: Backend 1: =# select ts_lexize('english_shared', 'test'); -- The dictionary is loaded into DSM, the segment and the mapping is pinned ... -- Call ts_lexize() from backend 2 below =# drop text search dictionary english_shared; -- The segment and the mapping is unpinned, see ts_dict_shmem_release() Backend 2: =# select ts_lexize('english_shared', 'test'); -- The dictionary got from DSM, the mapping is pinned ... -- The dictionary was dropped by backend 1, but the mapping still is pinned As you can see the DSM still is pinned by backend 2. Later I fixed it by checking do we need to unping segments. In the current version of the patch do_ts_dict_shmem_release() is called in lookup_ts_dictionary_cache(). It unpins segments if text search cache was invalidated. It unpins all segments, but I think it is ok since text search changes should be infrequent. > If you pin the mapping or the segment and later no longer want it > pinned, there are dsm_unpin_mapping() and dsm_unpin_segment() > functions available, too. So it seems like what you might want to do > is pin the segment when it's created, and then unpin it if it's > stale/obsolete. The latter won't remove it immediately, but will once > all the mappings are gone. Yes, dsm_unpin_mapping() and dsm_unpin_segment() will be called when the dictionary is dropped or altered in the current version of the patch. I descriped the approach above. In sum, I think the problem is mostly solved. Backend 2 unpins the segment in next ts_lexize() call. But if backend 2 doesn't call ts_lexize() (or other TS function) anymore the segment will remain mapped. It is the only problem I see for now. I hope the description is clear. I attached the rebased patch. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index 6f5b635413..09297e384c 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..8dd4959028 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel =
Re: [PROPOSAL] Shared Ispell dictionaries
On Tue, Mar 27, 2018 at 8:19 AM, Arthur Zakirovwrote: >> I assume the DSM infrastructure already has some solution for getting >> rid of DSM segments when the last interested process disconnects, >> so maybe you could piggyback on that somehow. > > Yes, there is dsm_pin_mapping() for this. But it is necessary to keep a > segment even if there are no attached processes. From 0003: > > + /* Remain attached until end of postmaster */ > + dsm_pin_segment(seg); > + /* Remain attached until end of session */ > + dsm_pin_mapping(seg); I don't quite understand the problem you're trying to solve here, but: 1. Unless dsm_pin_segment() is called, a DSM segment will automatically be removed when there are no remaining mappings. 2. Unless dsm_pin_mapping() is called, a DSM segment will be unmapped when the currently-in-scope resource owner is cleaned up, like at the end of the query. If it is called, then the mapping will stick around until the backend exits. If you pin the mapping or the segment and later no longer want it pinned, there are dsm_unpin_mapping() and dsm_unpin_segment() functions available, too. So it seems like what you might want to do is pin the segment when it's created, and then unpin it if it's stale/obsolete. The latter won't remove it immediately, but will once all the mappings are gone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, Mar 29, 2018 at 02:03:07AM +0300, Arthur Zakirov wrote: > Here is the new version of the patch. Please find the attached new version of the patch. I removed refcnt because it is useless, it doesn't guarantee that a hash table entry will be removed. I fixed a bug, dsm_unpin_segment() can be called twice if a transaction which called it was aborted and another transaction calls ts_dict_shmem_release(). I added segment_ispinned to fix it. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..8dd4959028 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..0b8a32d459 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index 247c202755..2a2fbee5fa 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -267,12 +267,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; boolfileloaded = false; ListCell *l; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index 3a843512d1..3753e32b2c 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -386,17 +386,25 @@ verify_dictoptions(Oid tmplId, List *dictoptions) } else { + DictInitData init_data; + /* * Copy the options just in case init method thinks it can scribble on * them ... */ dictoptions = copyObject(dictoptions); + init_data.dict_options = dictoptions; + init_data.dict.id = InvalidOid; + init_data.dict.xmin = InvalidTransactionId; + init_data.dict.xmax = InvalidTransactionId; + ItemPointerSetInvalid(_data.dict.tid); + /*
Re: [PROPOSAL] Shared Ispell dictionaries
Tomas Vondra wrote: > > On 03/31/2018 12:42 PM, Arthur Zakirov wrote: > > Hello all, > > > > I'd like to add new optional function to text search template named fini > > in addition to init() and lexize(). It will be called by > > RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will > > call ts_dict_shmem_release(). > > > > It doesn't change segments leaking situation. I think it makes text > > search API more transparent. > > > > If it doesn't actually solve the problem, why add it? I don't see a > point in adding functions for the sake of transparency, when it does not > in fact serve any use cases. It doesn't solve the problem. But it brings more clearness, if a dictionary requested shared location then it should release/unpin it. There are no such scenario yet, but someone might want to release not only shared segment but also other private data. Can't we handle the segment-leaking by adding some sort of tombstone? It is interesting that there are such tombstones already, without the patch. TSDictionaryCacheEntry entries aren't deleted after DROP, they are just marked isvalid = false. > For example, imagine that instead of removing the hash table entry we > mark it as 'dropped'. And after that, after the lookup we would know the > dictionary was removed, and the backends would load the dictionary into > their private memory. > > Of course, this could mean we end up having many tombstones in the hash > table. But those tombstones would be tiny, making it less painful than > potentially leaking much more memory for the dictionaries. Now actually Isn't guaranteed that the hash table entry will be removed. Even if refcnt is 0. So I think I should remove refcnt and entries won't be removed. There are no big problems with leaking now. Memory may leak only if a dictionary was dropped or altered and there is no text search workload anymore and the backend still alive. Because next using of text search functions will unpin segments used before for invalid dictionaries (isvalid == false). Also the segment is unpinned if the backend terminates. The segment is destroyed when all interested processes unpin the segment (as Tom noticed), the hash table entry becomes tombstone. I hope I described clear. > Also, I wonder if we might actually remove the dictionaries after a > while, e.g. based on XID. Imagine that we note the XID of the > transaction removing the dictionary, or perhaps XID of the most recent > running transaction. Then we could use this to decide if all running > transactions actually see the DROP, and we could remove the tombstone. Maybe autovacuum should work here too :) It is joke of course. I'm not very aware of removing dead tuples, but I think here is similar case. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/31/2018 12:42 PM, Arthur Zakirov wrote: > Hello all, > > I'd like to add new optional function to text search template named fini > in addition to init() and lexize(). It will be called by > RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will > call ts_dict_shmem_release(). > > It doesn't change segments leaking situation. I think it makes text > search API more transparent. > If it doesn't actually solve the problem, why add it? I don't see a point in adding functions for the sake of transparency, when it does not in fact serve any use cases. Can't we handle the segment-leaking by adding some sort of tombstone? For example, imagine that instead of removing the hash table entry we mark it as 'dropped'. And after that, after the lookup we would know the dictionary was removed, and the backends would load the dictionary into their private memory. Of course, this could mean we end up having many tombstones in the hash table. But those tombstones would be tiny, making it less painful than potentially leaking much more memory for the dictionaries. Also, I wonder if we might actually remove the dictionaries after a while, e.g. based on XID. Imagine that we note the XID of the transaction removing the dictionary, or perhaps XID of the most recent running transaction. Then we could use this to decide if all running transactions actually see the DROP, and we could remove the tombstone. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
Hello all, I'd like to add new optional function to text search template named fini in addition to init() and lexize(). It will be called by RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will call ts_dict_shmem_release(). It doesn't change segments leaking situation. I think it makes text search API more transparent. I'll update the existing documentation. And I think I can add text search API documentation in the 2018-09 commitfest, as Tom noticed that it doesn't exist. Any thoughts? -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Here is the new version of the patch. Now RemoveTSDictionaryById() and AlterTSDictionary() unpin the dictionary DSM segment. So if all attached backends disconnect allocated DSM segments will be released. lookup_ts_dictionary_cache() may unping DSM mapping for all invalid dictionary cache entries. I added xmax in DictPointerData. It is used as a lookup key now too. It helps to reload a dictionary after roll back DROP command. There was a bug in ts_dict_shmem_location(), I fixed it. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..8dd4959028 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..0b8a32d459 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index 247c202755..2a2fbee5fa 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -267,12 +267,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; boolfileloaded = false; ListCell *l; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index 3a843512d1..3753e32b2c 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -386,17 +386,25 @@ verify_dictoptions(Oid tmplId, List *dictoptions) } else { + DictInitData init_data; + /* * Copy the options just in case init method thinks it can scribble on * them ... */ dictoptions = copyObject(dictoptions); + init_data.dict_options = dictoptions; + init_data.dict.id = InvalidOid; + init_data.dict.xmin = InvalidTransactionId; + init_data.dict.xmax = InvalidTransactionId; +
Re: [PROPOSAL] Shared Ispell dictionaries
Please find the attached new version of the patch. I got rid of 0005 and 0006 parts. There are no max_shared_dictionaries_size variable, Shareable option, pg_ts_shared_dictionaries view anymore. On Sat, Mar 24, 2018 at 04:56:36PM -0400, Tom Lane wrote: > I do think it's required that changing the dictionary's options with > ALTER TEXT SEARCH DICTIONARY automatically cause a reload; but if that's > happening with this patch, I don't see where. (It might work to use > the combination of dictionary OID and TID of the dictionary's pg_ts_dict > tuple as the lookup key for shared dictionaries. Oh, and have you > thought about the possibility of conflicting OIDs in different DBs? > Probably the database OID has to be part of the key, as well.) The database OID, the dictionary OID, TID and XMIN are used now as lookup key. > Also, the scheme for releasing the dictionary DSM during > RemoveTSDictionaryById is uncertain and full of race conditions: > the DROP might roll back later, or someone might come along and > start using the dictionary (causing a fresh DSM load) before the > DROP commits and makes the dictionary invisible to other sessions. > I don't think that either of those are necessarily fatal objections, > but there needs to be some commentary there explaining what happens. The dictionary's DSM segment is alive till postmaster terminates now. But when the dictionary is dropped or altered then the previous (invalid) segment is unpinned. The segment itself is released when all backends unpins mapping in lookup_ts_parser_cache() or by disconnecting. The problem here comes when the dictionary was used before dropping or altering by some process, isn't used after and the process lives a very long time. In this situation the mapping isn't unpinned and the segment isn't released. The other problem is that TsearchDictEntry isn't removed if ts_dict_shmem_release() wasn't called. It may happen after dropping the dictionary. > BTW, I was going to complain that this patch alters the API for > dictionary template init functions without any documentation updates; > but then I realized that there isn't any documentation to update. > That pretty well sucks, but I suppose it's not the job of this patch > to improve that situation. Still, you could spend a bit more effort on > the commentary in ts_public.h in 0002, because that commentary is as > close to an API spec as we've got. I improved a little bit the commentary in ts_public.h. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..8dd4959028 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..0b8a32d459 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum
Re: [PROPOSAL] Shared Ispell dictionaries
On Mon, Mar 26, 2018 at 11:27:48AM -0400, Tom Lane wrote: > Arthur Zakirovwrites: > > I'm not sure that I understood the second case correclty. Can cache > > invalidation help in this case? I don't have confident knowledge of cache > > invalidation. It seems to me that InvalidateTSCacheCallBack() should > > release segment after commit. > > "Release after commit" sounds like a pretty dangerous design to me, > because a release necessarily implies some kernel calls, which could > fail. We can't afford to inject steps that might fail into post-commit > cleanup (because it's too late to recover by failing the transaction). > It'd be better to do cleanup while searching for a dictionary to use. > > I assume the DSM infrastructure already has some solution for getting > rid of DSM segments when the last interested process disconnects, > so maybe you could piggyback on that somehow. Yes, there is dsm_pin_mapping() for this. But it is necessary to keep a segment even if there are no attached processes. From 0003: + /* Remain attached until end of postmaster */ + dsm_pin_segment(seg); + /* Remain attached until end of session */ + dsm_pin_mapping(seg); -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Arthur Zakirovwrites: > On Sun, Mar 25, 2018 at 12:18:10AM -0400, Tom Lane wrote: >> My thought was (a) the ROLLBACK case is ok, because the next use of >> the dictionary will reload it, and (b) the reload-concurrently-with- >> DROP case is annoying, because indeed it leaks, but the window is small >> and it probably won't be an issue in practice. We would need to be >> sure that the DSM segment goes away at postmaster restart, but given >> that I think it'd be tolerable. Of course it'd be better not to have >> the race, but I see no easy way to prevent it -- do you? > I'm not sure that I understood the second case correclty. Can cache > invalidation help in this case? I don't have confident knowledge of cache > invalidation. It seems to me that InvalidateTSCacheCallBack() should > release segment after commit. "Release after commit" sounds like a pretty dangerous design to me, because a release necessarily implies some kernel calls, which could fail. We can't afford to inject steps that might fail into post-commit cleanup (because it's too late to recover by failing the transaction). It'd be better to do cleanup while searching for a dictionary to use. I assume the DSM infrastructure already has some solution for getting rid of DSM segments when the last interested process disconnects, so maybe you could piggyback on that somehow. regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
On Sun, Mar 25, 2018 at 02:28:29PM -0400, Tom Lane wrote: > Arthur Zakirovwrites: > > If all dictionaries will be shareable then this view could be removed. > > Unfortunately I think it can't help with leaked segments, I didn't find > > a way to iterate dshash entries. That's why pg_ts_shared_dictionaries() > > scans pg_ts_dict table instead of scanning dshash table. > > If you're scanning pg_ts_dict, what happens with dictionaries belonging > to other databases? They won't be visible in your local copy of > pg_ts_dict. Between that and the inability to find leaked segments, > I'm not seeing that this has much use-case. Indeed pg_ts_dict scanning is wrong way here. And pg_ts_shared_dictionaries() is definitely broken. > > Yes unfortunately ALTER TEXT SEARCH DICTIONARY doesn't reload a > > dictionary. TID can help here. I thought about using XID too when I > > started to work on RELOAD command. But I'm not sure that it is a good > > idea, anyway XID isn't needed in current version. > > Actually, existing practice is to check both xmin and tid; see for example > where plpgsql checks if a cached function data structure still matches the > pg_proc row, pl_comp.c around line 175 in HEAD. The other PLs do it > similarly I think. I'm not sure offhand just how much that changes the > risks of a false match compared to testing only one of these fields, but > I'd recommend conforming to the way it's done elsewhere. Thank you for pointing to it! I think it shouldn't be hard to use both xmin and tid. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Sun, Mar 25, 2018 at 12:18:10AM -0400, Tom Lane wrote: > My thought was (a) the ROLLBACK case is ok, because the next use of > the dictionary will reload it, and (b) the reload-concurrently-with- > DROP case is annoying, because indeed it leaks, but the window is small > and it probably won't be an issue in practice. We would need to be > sure that the DSM segment goes away at postmaster restart, but given > that I think it'd be tolerable. Of course it'd be better not to have > the race, but I see no easy way to prevent it -- do you? I'm not sure that I understood the second case correclty. Can cache invalidation help in this case? I don't have confident knowledge of cache invalidation. It seems to me that InvalidateTSCacheCallBack() should release segment after commit. But cache isn't invalidated if a backend was terminated after a dictionary reloading. on_shmem_exit() could help, but we need a leaked dictionaries list for that. P.S. I think it isn't right to release all dictionaries segment in InvalidateTSCacheCallBack(). Otherwise any DROP can release all segments. It would be worth to release a specific dictionary. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Arthur Zakirovwrites: > On Sat, Mar 24, 2018 at 04:56:36PM -0400, Tom Lane wrote: >> * And that leads us to not particularly need a view telling which >> dictionaries are loaded, either. It's just an implementation detail >> that users don't need to worry about. > If all dictionaries will be shareable then this view could be removed. > Unfortunately I think it can't help with leaked segments, I didn't find > a way to iterate dshash entries. That's why pg_ts_shared_dictionaries() > scans pg_ts_dict table instead of scanning dshash table. If you're scanning pg_ts_dict, what happens with dictionaries belonging to other databases? They won't be visible in your local copy of pg_ts_dict. Between that and the inability to find leaked segments, I'm not seeing that this has much use-case. >> (It might work to use >> the combination of dictionary OID and TID of the dictionary's pg_ts_dict >> tuple as the lookup key for shared dictionaries. Oh, and have you >> thought about the possibility of conflicting OIDs in different DBs? >> Probably the database OID has to be part of the key, as well.) > Yes unfortunately ALTER TEXT SEARCH DICTIONARY doesn't reload a > dictionary. TID can help here. I thought about using XID too when I > started to work on RELOAD command. But I'm not sure that it is a good > idea, anyway XID isn't needed in current version. Actually, existing practice is to check both xmin and tid; see for example where plpgsql checks if a cached function data structure still matches the pg_proc row, pl_comp.c around line 175 in HEAD. The other PLs do it similarly I think. I'm not sure offhand just how much that changes the risks of a false match compared to testing only one of these fields, but I'd recommend conforming to the way it's done elsewhere. regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
On Sun, Mar 25, 2018 at 06:45:08AM +0200, Tomas Vondra wrote: > FWIW this is where the view listing dictionaries loaded into shared > memory would be helpful - you'd at least know there's a dictionary, > wasting memory. Unfortunately, It seems that this view can't help in listing leaked segments. I didn't find a way to list dshash entries. Now pg_ts_shared_dictionaries() scans pg_ts_dict table and gets a dshash item using dictId. In case of leaked dictionaries we don't know their identifiers. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On Sat, Mar 24, 2018 at 04:56:36PM -0400, Tom Lane wrote: > Arthur Zakirovwrites: > > [ v10 patch versions ] Thank you for the review, Tom! Tomas Vondra wrote: > Tom Lane wrote: >> * I cannot imagine a use-case for setting max_shared_dictionaries_size >> to anything except "unlimited". If it's not that, and you exceed it, >> then subsequent backends load private copies of the dictionary, making >> your memory situation rapidly worse not better. I think we should lose >> that GUC altogether and just load dictionaries automatically. > > Introduction of that limit is likely my fault. It came from from an > extension I wrote a long time ago, but back then it was a necessity > because we did not have DSM. So in retrospect I agree with you - it's > not particularly useful and we should ditch it. > > Arthur, let this be a lesson for you! You have to start fight against > bogus feature requests from other people ;-) Yeah, in this sense max_shared_dictionaries_size is pointless. I'll remove it then :). > * Similarly, I see no point in a "sharable" option on individual > dictionaries, especially when there's only one allowed setting for > most dictionary types. Let's lose that too. I think "Shareable" option could be useful if a shared dictionary building time was much longer than a non-shared dictionary building time. It is slightly longer because of additional memcpy(), but isn't noticable I think. So it is worth to remove it. > * And that leads us to not particularly need a view telling which > dictionaries are loaded, either. It's just an implementation detail > that users don't need to worry about. If all dictionaries will be shareable then this view could be removed. Unfortunately I think it can't help with leaked segments, I didn't find a way to iterate dshash entries. That's why pg_ts_shared_dictionaries() scans pg_ts_dict table instead of scanning dshash table. > I do think it's required that changing the dictionary's options with > ALTER TEXT SEARCH DICTIONARY automatically cause a reload; but if that's > happening with this patch, I don't see where. (It might work to use > the combination of dictionary OID and TID of the dictionary's pg_ts_dict > tuple as the lookup key for shared dictionaries. Oh, and have you > thought about the possibility of conflicting OIDs in different DBs? > Probably the database OID has to be part of the key, as well.) Yes unfortunately ALTER TEXT SEARCH DICTIONARY doesn't reload a dictionary. TID can help here. I thought about using XID too when I started to work on RELOAD command. But I'm not sure that it is a good idea, anyway XID isn't needed in current version. > Also, the scheme for releasing the dictionary DSM during > RemoveTSDictionaryById is uncertain and full of race conditions: > the DROP might roll back later, or someone might come along and > start using the dictionary (causing a fresh DSM load) before the > DROP commits and makes the dictionary invisible to other sessions. > I don't think that either of those are necessarily fatal objections, > but there needs to be some commentary there explaining what happens. I missed this case. As you wrote below ROLLBACK case is ok. But I haven't a soluton for the second case for now. If I won't solve it I'll add additional comments in RemoveTSConfigurationById() and maybe in the documentation if it's appropriate. > BTW, I was going to complain that this patch alters the API for > dictionary template init functions without any documentation updates; > but then I realized that there isn't any documentation to update. > That pretty well sucks, but I suppose it's not the job of this patch > to improve that situation. Still, you could spend a bit more effort on > the commentary in ts_public.h in 0002, because that commentary is as > close to an API spec as we've got. I'll fix the comments. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Tomas Vondrawrites: > FWIW this is where the view listing dictionaries loaded into shared > memory would be helpful - you'd at least know there's a dictionary, > wasting memory. Well, that's only because we failed to make the implementation transparent :-(. But it's not unlikely that an mmap-based implementation would be simply incapable of supporting such a view: the knowledge of whether a particular file is mapped in would be pretty much process-local, I think. So I'd really rather we don't add that. Also, while these dictionaries are indeed kind of large relative to our traditional view of shared memory, if they're in DSM segments that the kernel can swap out then I really suspect that nobody would much care if a few such segments had been leaked. I find it hard to imagine a use-case where DROP race conditions would lead us to leak so many that it becomes a serious problem. Maybe I lack imagination. regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
Tomas Vondrawrites: > On 3/24/18 9:56 PM, Tom Lane wrote: >> Also, the scheme for releasing the dictionary DSM during >> RemoveTSDictionaryById is uncertain and full of race conditions: >> the DROP might roll back later, or someone might come along and >> start using the dictionary (causing a fresh DSM load) before the >> DROP commits and makes the dictionary invisible to other sessions. >> I don't think that either of those are necessarily fatal objections, >> but there needs to be some commentary there explaining what happens. > Actually, I think that's an issue - such race condition might easily > leak the shared memory forever (because the new dictionary will get a > different OID etc.). It probably is not happening very often, because > dictionaries are not dropped very often. But it needs fixing I think. My thought was (a) the ROLLBACK case is ok, because the next use of the dictionary will reload it, and (b) the reload-concurrently-with- DROP case is annoying, because indeed it leaks, but the window is small and it probably won't be an issue in practice. We would need to be sure that the DSM segment goes away at postmaster restart, but given that I think it'd be tolerable. Of course it'd be better not to have the race, but I see no easy way to prevent it -- do you? regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
Arthur Zakirovwrites: > [ v10 patch versions ] I took a quick look through this. I agree with the comments about mmap-ability not being something we should insist on now, and maybe not ever. However, in order to keep our options open, it seems like we should minimize the amount of API we expose that's based on the current implementation. That leads me to the following thoughts: * I cannot imagine a use-case for setting max_shared_dictionaries_size to anything except "unlimited". If it's not that, and you exceed it, then subsequent backends load private copies of the dictionary, making your memory situation rapidly worse not better. I think we should lose that GUC altogether and just load dictionaries automatically. * Similarly, I see no point in a "sharable" option on individual dictionaries, especially when there's only one allowed setting for most dictionary types. Let's lose that too. * And that leads us to not particularly need a view telling which dictionaries are loaded, either. It's just an implementation detail that users don't need to worry about. This does beg the question of whether we need a way to flush dictionary contents that's short of restarting the server (or short of dropping and recreating the dictionary). I'm not sure, but even if we do, none of the above is necessary for that. I do think it's required that changing the dictionary's options with ALTER TEXT SEARCH DICTIONARY automatically cause a reload; but if that's happening with this patch, I don't see where. (It might work to use the combination of dictionary OID and TID of the dictionary's pg_ts_dict tuple as the lookup key for shared dictionaries. Oh, and have you thought about the possibility of conflicting OIDs in different DBs? Probably the database OID has to be part of the key, as well.) Also, the scheme for releasing the dictionary DSM during RemoveTSDictionaryById is uncertain and full of race conditions: the DROP might roll back later, or someone might come along and start using the dictionary (causing a fresh DSM load) before the DROP commits and makes the dictionary invisible to other sessions. I don't think that either of those are necessarily fatal objections, but there needs to be some commentary there explaining what happens. BTW, I was going to complain that this patch alters the API for dictionary template init functions without any documentation updates; but then I realized that there isn't any documentation to update. That pretty well sucks, but I suppose it's not the job of this patch to improve that situation. Still, you could spend a bit more effort on the commentary in ts_public.h in 0002, because that commentary is as close to an API spec as we've got. regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, Mar 21, 2018 at 12:00:52PM +0300, Arthur Zakirov wrote: > On Tue, Mar 20, 2018 at 09:30:15PM +0100, Tomas Vondra wrote: > > I wonder if these restrictions needed? I mean, why not to allow setting > > max_shared_dictionaries_size below the size of loaded dictionaries? > > > > Of course, on the one hand those restriction seem sensible. On the other > > hand, perhaps in some cases it would be useful to allow violating them? > > > > I mean, why not to simply disable loading of new dictionaries when > > > > (max_shared_dictionaries_size < loaded_size) > > > > Maybe I'm over-thinking this though. It's probably safer and less > > surprising to enforce the restrictions. > > Hm, yes in some cases this check may be over-engineering. I thought that > it is reasonable and safer in v7 patch. But there are similar GUCs, > wal_keep_segments and max_wal_size, which don't do additional checks. > And people are fine with them. So I removed that check from the variable. > > Please find the attached new version of the patch. I forgot to fix regression tests for max_shared_dictionaries_size. Also I'm not confident about using pg_reload_conf() in regression tests. I haven't found where pg_reload_conf() is used in tests. So I removed max_shared_dictionaries_size tests for now. Sorry for the noise. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..e11d1129e9 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..c3146bae3c 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index 247c202755..2e66331ed8 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -267,12 +267,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; boolfileloaded = false; ListCell *l; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l);
Re: [PROPOSAL] Shared Ispell dictionaries
On Tue, Mar 20, 2018 at 09:30:15PM +0100, Tomas Vondra wrote: > On 03/20/2018 02:11 PM, Arthur Zakirov wrote: > > max_shared_dictionaries_size is defined as PGC_SIGHUP now. Added check > > of a new value to disallow to set zero if there are loaded dictionaries > > and to decrease maximum allowed size if loaded size is greater than the > > new value. > > > > I wonder if these restrictions needed? I mean, why not to allow setting > max_shared_dictionaries_size below the size of loaded dictionaries? > > Of course, on the one hand those restriction seem sensible. On the other > hand, perhaps in some cases it would be useful to allow violating them? > > I mean, why not to simply disable loading of new dictionaries when > > (max_shared_dictionaries_size < loaded_size) > > Maybe I'm over-thinking this though. It's probably safer and less > surprising to enforce the restrictions. Hm, yes in some cases this check may be over-engineering. I thought that it is reasonable and safer in v7 patch. But there are similar GUCs, wal_keep_segments and max_wal_size, which don't do additional checks. And people are fine with them. So I removed that check from the variable. Please find the attached new version of the patch. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..e11d1129e9 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..c3146bae3c 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index 247c202755..2e66331ed8 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -267,12 +267,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; boolfileloaded = false; ListCell *l; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/20/2018 02:11 PM, Arthur Zakirov wrote: > Hello, > > On Mon, Mar 19, 2018 at 08:50:46PM +0100, Tomas Vondra wrote: >> Hi Arthur, >> >> I went through the patch - just skimming through the diffs, will do more >> testing tomorrow. Here are a few initial comments. > > Thank you for the review! > >> 1) max_shared_dictionaries_size / PGC_POSTMASTER >> >> I'm not quite sure why the GUC is defined as PGC_POSTMASTER, i.e. why it >> can't be changed after server start. That seems like a fairly useful >> thing to do (e.g. increase the limit while the server is running), and >> after looking at the code I think it shouldn't be difficult to change. > > max_shared_dictionaries_size is defined as PGC_SIGHUP now. Added check > of a new value to disallow to set zero if there are loaded dictionaries > and to decrease maximum allowed size if loaded size is greater than the > new value. > I wonder if these restrictions needed? I mean, why not to allow setting max_shared_dictionaries_size below the size of loaded dictionaries? Of course, on the one hand those restriction seem sensible. On the other hand, perhaps in some cases it would be useful to allow violating them? I mean, why not to simply disable loading of new dictionaries when (max_shared_dictionaries_size < loaded_size) Maybe I'm over-thinking this though. It's probably safer and less surprising to enforce the restrictions. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, On Mon, Mar 19, 2018 at 08:50:46PM +0100, Tomas Vondra wrote: > Hi Arthur, > > I went through the patch - just skimming through the diffs, will do more > testing tomorrow. Here are a few initial comments. Thank you for the review! > 1) max_shared_dictionaries_size / PGC_POSTMASTER > > I'm not quite sure why the GUC is defined as PGC_POSTMASTER, i.e. why it > can't be changed after server start. That seems like a fairly useful > thing to do (e.g. increase the limit while the server is running), and > after looking at the code I think it shouldn't be difficult to change. max_shared_dictionaries_size is defined as PGC_SIGHUP now. Added check of a new value to disallow to set zero if there are loaded dictionaries and to decrease maximum allowed size if loaded size is greater than the new value. > The other thing I'd suggest is handling "-1" as "no limit". I added availability to set '-1'. Fixed some comments and the documentation. > 2) max_shared_dictionaries_size / size of number > > Some of the comments dealing with the GUC treat it as a number of > dictionaries (instead of a size). I suppose that's due to how the > original patch was implemented. Fixed. Should be good now. > 3) Assert(max_shared_dictionaries_size); > > I'd say that assert is not very clear - it should be > > Assert(max_shared_dictionaries_size > 0); > > or something along the lines. It's also a good idea to add a comment > explaining the assert, say > > /* we can only get here when shared dictionaries are enabled */ > Assert(max_shared_dictionaries_size > 0); Fixed the assert and added the comment. I extended the assert, it also takes into account -1 value. > 4) I took the liberty of rewording some of the docs/comments. See the > attached diffs, that should apply on top of 0003 and 0004 patches. > Please, treat those as mere suggestions. I applied your diffs and added changes to max_shared_dictionaries_size. Please find the attached new version of the patch. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..e11d1129e9 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..c3146bae3c 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *)
Re: [PROPOSAL] Shared Ispell dictionaries
Hi Arthur, I went through the patch - just skimming through the diffs, will do more testing tomorrow. Here are a few initial comments. 1) max_shared_dictionaries_size / PGC_POSTMASTER I'm not quite sure why the GUC is defined as PGC_POSTMASTER, i.e. why it can't be changed after server start. That seems like a fairly useful thing to do (e.g. increase the limit while the server is running), and after looking at the code I think it shouldn't be difficult to change. The other thing I'd suggest is handling "-1" as "no limit". 2) max_shared_dictionaries_size / size of number Some of the comments dealing with the GUC treat it as a number of dictionaries (instead of a size). I suppose that's due to how the original patch was implemented. 3) Assert(max_shared_dictionaries_size); I'd say that assert is not very clear - it should be Assert(max_shared_dictionaries_size > 0); or something along the lines. It's also a good idea to add a comment explaining the assert, say /* we can only get here when shared dictionaries are enabled */ Assert(max_shared_dictionaries_size > 0); 4) I took the liberty of rewording some of the docs/comments. See the attached diffs, that should apply on top of 0003 and 0004 patches. Please, treat those as mere suggestions. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6862d5e..6747fe2 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1433,23 +1433,23 @@ include_dir 'conf.d' -Sets the maximum size of all text search dictionaries loaded into shared -memory. The default is 100 megabytes (100MB). This -parameter can only be set at server start. +Specifies the amount of shared memory to be used to store full-text search +search dictionaries. The default is 100 megabytes (100MB). +This parameter can only be set at server start. -Currently controls only loading of Ispell -dictionaries (see ). -After compiling the dictionary it will be copied into shared memory. -Another backends on first use of the dictionary will use it from shared -memory, so it doesn't need to compile the dictionary second time. +Currently only Ispell dictionaries (see +) may be loaded into +shared memory. The first backend requesting the dictionary will +build it and copy it into shared memory, so that other backends can +reuse it without having to build it again. -If total size of simultaneously loaded dictionaries reaches the maximum -allowed size then a new dictionary will be loaded into local memory of -a backend. +If the size of simultaneously loaded dictionaries reaches the maximum +allowed size, additional dictionares will be loaded into private backend +memory (effectively disabling the sharing). diff --git a/src/backend/tsearch/ts_shared.c b/src/backend/tsearch/ts_shared.c index bfc5292..22d58a0 100644 --- a/src/backend/tsearch/ts_shared.c +++ b/src/backend/tsearch/ts_shared.c @@ -22,7 +22,7 @@ /* - * Hash table structures + * Hash table entries representing shared dictionaries. */ typedef struct { @@ -37,7 +37,8 @@ typedef struct static dshash_table *dict_table = NULL; /* - * Shared struct for locking + * Information about the main shmem segment, used to coordinate + * access to the hash table and dictionaries. */ typedef struct { @@ -53,8 +54,8 @@ typedef struct static TsearchCtlData *tsearch_ctl; /* - * GUC variable for maximum number of shared dictionaries. Default value is - * 100MB. + * Maximum allowed amount of shared memory for shared dictionaries, + * in kilobytes. Default value is 100MB. */ int max_shared_dictionaries_size = 100 * 1024; @@ -213,7 +202,7 @@ ts_dict_shmem_location(DictInitData *initoptions, /* * Release memory occupied by the dictionary. Function just unpins DSM mapping. - * If nobody else hasn't mapping to this DSM then unping DSM segment. + * If nobody else hasn't mapping to this DSM then unpin DSM segment. * * dictid: Oid of the dictionary. */ @@ -312,6 +301,7 @@ init_dict_table(void) MemoryContext old_context; dsa_area *dsa; + /* bail out if shared dictionaries not allowed */ if (max_shared_dictionaries_size == 0) return; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 172627a..b10ec48 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2939,7 +2939,7 @@ static struct config_int ConfigureNamesInt[] = gettext_noop("Currently controls only loading of Ispell dictionaries. " "If total size of simultaneously loaded dictionaries " "reaches the maximum allowed size then a new dictionary " -
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/19/2018 07:07 PM, Andres Freund wrote: > On 2018-03-19 14:52:34 +0100, Tomas Vondra wrote: >> On 03/19/2018 02:34 AM, Andres Freund wrote: >>> Hi, >>> >>> On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote: I do agree with that. We have a working well-understood dsm-based solution, addressing the goals initially explained in this thread. >>> >>> Well, it's also awkward and manual to use. I do think that's >>> something we've to pay attention to. >>> >> >> Awkward in what sense? > > You've to manually configure a setting that can only be set at server > start. You can't set it as big as necessary because it might use up > memory better used for other things. It needs the full space for > dictionaries even if the majority of it never will be needed. All of > those aren't needed in an mmap world. > Which is not quite true, because that's not what the patch does. Each dictionary is loaded into a separate dsm segment when needed, which is then stored in a dhash table. So most of what you wrote is not really true - the patch does not pre-allocate the space, and the setting might be set even after server start (it's not defined like that currently, but that should be trivial to change). > >> So, I'm not at all convinced the mmap approach is actually better >> than the dsm one. And I believe that if we come up with a good way >> to automate some of the tasks, I don't see why would that be >> possible in the mmap and not dsm. > > To me it seems we'll end up needing a heck of a lot more code that > the OS already implements if we do it ourselves. > Like what? Which features do you expect to need much more code? The automated reloading will need a fairly small amount of code - the main issue is deciding when to reload, and as I mentioned before that's more complicated than you seem to believe. In fact, it may not even be possible - there's no way to decide if all files are already updated. Currently we kinda ignore that, on the assumption that dictionaries change only rarely. We may do the same thing and reload the dict if at least one file changes. In any case, the amount of code is trivial. In fact, it may be more complicated in the mmap case - how do you update a dictionary that is already mapped to multiple processes? The eviction is harder - I'll give you that. But then again, I'm not sure the mmap approach is really what we want here - it seems better to evict the whole dictionary, than some random pages from many of them. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On 2018-03-19 14:52:34 +0100, Tomas Vondra wrote: > On 03/19/2018 02:34 AM, Andres Freund wrote: > > Hi, > > > > On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote: > >> I do agree with that. We have a working well-understood dsm-based > >> solution, addressing the goals initially explained in this thread. > > > > Well, it's also awkward and manual to use. I do think that's > > something we've to pay attention to. > > > > Awkward in what sense? You've to manually configure a setting that can only be set at server start. You can't set it as big as necessary because it might use up memory better used for other things. It needs the full space for dictionaries even if the majority of it never will be needed. All of those aren't needed in an mmap world. > So, I'm not at all convinced the mmap approach is actually better than > the dsm one. And I believe that if we come up with a good way to > automate some of the tasks, I don't see why would that be possible in > the mmap and not dsm. To me it seems we'll end up needing a heck of a lot more code that the OS already implements if we do it ourselves. Greetings, Andres Freund
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/19/2018 02:34 AM, Andres Freund wrote: > Hi, > > On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote: >> I do agree with that. We have a working well-understood dsm-based >> solution, addressing the goals initially explained in this thread. > > Well, it's also awkward and manual to use. I do think that's > something we've to pay attention to. > Awkward in what sense? I don't think the manual aspect is an issue. Currently we have no way to reload the dictionary, except for restarting all the backends. I don't see that as a particularly convenient solution. Also, this is pretty much how the shared_ispell extension works, although you might argue that was more due to the limitation of how shared memory could be used in extensions before DSM was introduced. In any case, I've never heard complaints about this aspect of the extension. There are about two things that might be automated - reloading of dictionaries and evicting them when hitting the memory limit. I have tried to implement that in the shared_ispell dictionary but it's a bit more complicated than it looks. For example, it seems obvious to reload the dictionary when the file timestamp changes. But in fact there are three files - dict, affixes, stopwords. So will you reload when a single file changes? All of them? Keep in mind that the new version of dictionary may use different affixes, so a reload at the wrong moment may result in broken result. > >> I wonder how much of this patch would be affected by the switch >> from dsm to mmap? I guess the memory limit would get mostly >> irrelevant (mmap would rely on the OS to page the memory in/out >> depending on memory pressure), and so would the UNLOAD/RELOAD >> commands (because each backend would do it's own mmap). > > Those seem fairly major. > I'm not sure I'd say those are major. And you might also see the lack of these capabilities as negative points for the mmap approach. So, I'm not at all convinced the mmap approach is actually better than the dsm one. And I believe that if we come up with a good way to automate some of the tasks, I don't see why would that be possible in the mmap and not dsm. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On Mon, 19 Mar 2018 14:06:50 +0300 Arthur Zakirovwrote: > > I beleive mmap requires completely rewrite 0003 part of the patch and > a little changes in 0005. > > > In any case, I suggest to polish the dsm-based patch, and see if we > > can get that one into PG11. > > Yes we have more time in future commitfests if dsm-based patch won't > be approved. > Hi, I'm not sure about mmap approach, it would just bring another problems. I like the dsm approach because it's not inventing any new files in the database, when mmap approach will possibly require new folder in data directory and management above bunch of new files, with additional issues related with pg_upgrade and etc. Also in dsm approach if someone needs to update dictionaries then he (or his package manager) can just replace files and be done with it. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Arthur Zakirov wrote: > I've planned only to improve the documentation a little. Also it seems I > should change 0004 part, I found that extension upgrade scripts may be made > in wrong way. I've attached new version of the patch. In this version I removed 0004-Update-tmplinit-arguments-v6.patch. In my opinion it handled extensions upgrade in wrong way. If I'm not mistaken currently there is no way to upgrade a template's init function signature. And I didn't find way to change init_method(internal) to init_method(internal, internal) within an extension's upgrade script. Therefore I added 0002-Change-tmplinit-argument-v7.patch. Now DictInitData struct is passed in a template's init method. It contains necessary data: dictoptions and dictid. And there is no need to change the method's signature. Other parts of the patch are same, except that they use DictInitData structure now. On Mon, Mar 19, 2018 at 01:52:41AM +0100, Tomas Vondra wrote: > I wonder how much of this patch would be affected by the switch from dsm > to mmap? I guess the memory limit would get mostly irrelevant (mmap > would rely on the OS to page the memory in/out depending on memory > pressure), and so would the UNLOAD/RELOAD commands (because each backend > would do it's own mmap). I beleive mmap requires completely rewrite 0003 part of the patch and a little changes in 0005. > In any case, I suggest to polish the dsm-based patch, and see if we can > get that one into PG11. Yes we have more time in future commitfests if dsm-based patch won't be approved. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 56ede37089..e11d1129e9 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index a79ece240c..c3146bae3c 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dictoptions) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index 247c202755..2e66331ed8 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -267,12 +267,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar
Re: [PROPOSAL] Shared Ispell dictionaries
Hi, On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote: > I do agree with that. We have a working well-understood dsm-based > solution, addressing the goals initially explained in this thread. Well, it's also awkward and manual to use. I do think that's something we've to pay attention to. > I wonder how much of this patch would be affected by the switch from dsm > to mmap? I guess the memory limit would get mostly irrelevant (mmap > would rely on the OS to page the memory in/out depending on memory > pressure), and so would the UNLOAD/RELOAD commands (because each backend > would do it's own mmap). Those seem fairly major. Greetings, Andres Freund
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/17/2018 05:43 AM, Arthur Zakirov wrote: > Hello Tomas, > > Arthur, what are your plans with this patch in the current CF? > > > I think dsm-based approach is in good shape already and works nice. > I've planned only to improve the documentation a little. Also it seems I > should change 0004 part, I found that extension upgrade scripts may be > made in wrong way. > In my opinion RELOAD and UNLOAD commands can be made in next commitfest > (2018-09). > Did you look it? Have you arguments about how shared memory allocation > and releasing functions are made? > > > > It does not seem to be moving towards RFC very much, and reworking the > patch to use mmap() seems like a quite significant change late in the > CF. Which means it's likely to cause the patch get get bumped to the > next CF (2018-09). > > > Agree. I have a draft version for mmap-based approach which works in > platforms with mmap. In Windows it is necessary to use another API > (CreateFileMapping, etc). But this approach requires more work on > handling processed dictionary files (how name them, when remove). > > > > FWIW I am not quite sure if the mmap() approach is better than what was > implemented by the patch. I'm not sure how exactly will it behave under > memory pressure (AFAIK it goes through page cache, which means random > parts of dictionaries might get evicted) or how well is it supported on > various platforms (say, Windows). > > > Yes, as I wrote mmap-based approach requires more work. The only > benefit I see is that you don't need to process a dictionary after > server restart. I'd vote for dsm-based approach. > I do agree with that. We have a working well-understood dsm-based solution, addressing the goals initially explained in this thread. I don't see a reason to stall this patch based on a mere assumption that the mmap-based approach might be magically better in some unknown aspects. It might be, but we may as well leave that as a future work. I wonder how much of this patch would be affected by the switch from dsm to mmap? I guess the memory limit would get mostly irrelevant (mmap would rely on the OS to page the memory in/out depending on memory pressure), and so would the UNLOAD/RELOAD commands (because each backend would do it's own mmap). In any case, I suggest to polish the dsm-based patch, and see if we can get that one into PG11. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
Hello Tomas, Arthur, what are your plans with this patch in the current CF? I think dsm-based approach is in good shape already and works nice. I've planned only to improve the documentation a little. Also it seems I should change 0004 part, I found that extension upgrade scripts may be made in wrong way. In my opinion RELOAD and UNLOAD commands can be made in next commitfest (2018-09). Did you look it? Have you arguments about how shared memory allocation and releasing functions are made? > > It does not seem to be moving towards RFC very much, and reworking the > patch to use mmap() seems like a quite significant change late in the > CF. Which means it's likely to cause the patch get get bumped to the > next CF (2018-09). Agree. I have a draft version for mmap-based approach which works in platforms with mmap. In Windows it is necessary to use another API (CreateFileMapping, etc). But this approach requires more work on handling processed dictionary files (how name them, when remove). > > FWIW I am not quite sure if the mmap() approach is better than what was > implemented by the patch. I'm not sure how exactly will it behave under > memory pressure (AFAIK it goes through page cache, which means random > parts of dictionaries might get evicted) or how well is it supported on > various platforms (say, Windows). > Yes, as I wrote mmap-based approach requires more work. The only benefit I see is that you don't need to process a dictionary after server restart. I'd vote for dsm-based approach. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/07/2018 02:18 PM, Arthur Zakirov wrote: > On Wed, Mar 07, 2018 at 02:12:32PM +0100, Pavel Stehule wrote: >> 2018-03-07 14:10 GMT+01:00 Pavel Stehule: >>> 2018-03-07 13:58 GMT+01:00 Arthur Zakirov : Oh understood. Tomas suggested those commands too earlier. I'll implement them. But I think it is better to track files modification time too. Because now, without the patch, users don't have to call additional commands to refresh their dictionaries, so without such tracking we'll made dictionaries maintenance harder. >>> >>> Postgres hasn't any subsystem based on modification time, so >>> introduction this sensitivity, I don't see, practical. >>> >> >> Usually the shared dictionaries are used for complex language >> based fulltext. The frequence of updates of these dictionaries is >> less than updates PostgreSQL. The czech dictionary is same 10 >> years. > > Agree. In this case auto reloading isn't important feature here. > Arthur, what are your plans with this patch in the current CF? It does not seem to be moving towards RFC very much, and reworking the patch to use mmap() seems like a quite significant change late in the CF. Which means it's likely to cause the patch get get bumped to the next CF (2018-09). FWIW I am not quite sure if the mmap() approach is better than what was implemented by the patch. I'm not sure how exactly will it behave under memory pressure (AFAIK it goes through page cache, which means random parts of dictionaries might get evicted) or how well is it supported on various platforms (say, Windows). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, Mar 07, 2018 at 02:12:32PM +0100, Pavel Stehule wrote: > 2018-03-07 14:10 GMT+01:00 Pavel Stehule: > > 2018-03-07 13:58 GMT+01:00 Arthur Zakirov : > >> Oh understood. Tomas suggested those commands too earlier. I'll > >> implement them. But I think it is better to track files modification time > >> too. Because now, without the patch, users don't have to call additional > >> commands to refresh their dictionaries, so without such tracking we'll > >> made dictionaries maintenance harder. > >> > > > > Postgres hasn't any subsystem based on modification time, so introduction > > this sensitivity, I don't see, practical. > > > > Usually the shared dictionaries are used for complex language based > fulltext. The frequence of updates of these dictionaries is less than > updates PostgreSQL. The czech dictionary is same 10 years. Agree. In this case auto reloading isn't important feature here. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
2018-03-07 14:10 GMT+01:00 Pavel Stehule: > > > 2018-03-07 13:58 GMT+01:00 Arthur Zakirov : > >> On Wed, Mar 07, 2018 at 01:47:25PM +0100, Pavel Stehule wrote: >> > > Do you mean that a shared dictionary should be reloaded if its .affix >> > > and .dict files was changed? IMHO we can store last modification >> > > timestamp of them in a preprocessed file, and then we can rebuild the >> > > dictionary if files was changed. >> > > >> > >> > No, it is not necessary - just there should be commands (functions) for >> > preload dictiory and unload dictionary. >> >> Oh understood. Tomas suggested those commands too earlier. I'll >> implement them. But I think it is better to track files modification time >> too. Because now, without the patch, users don't have to call additional >> commands to refresh their dictionaries, so without such tracking we'll >> made dictionaries maintenance harder. >> > > Postgres hasn't any subsystem based on modification time, so introduction > this sensitivity, I don't see, practical. > Usually the shared dictionaries are used for complex language based fulltext. The frequence of updates of these dictionaries is less than updates PostgreSQL. The czech dictionary is same 10 years. Regards Pavel > > Regards > > Pavel > > > >> >> -- >> Arthur Zakirov >> Postgres Professional: http://www.postgrespro.com >> Russian Postgres Company >> > >
Re: [PROPOSAL] Shared Ispell dictionaries
2018-03-07 13:58 GMT+01:00 Arthur Zakirov: > On Wed, Mar 07, 2018 at 01:47:25PM +0100, Pavel Stehule wrote: > > > Do you mean that a shared dictionary should be reloaded if its .affix > > > and .dict files was changed? IMHO we can store last modification > > > timestamp of them in a preprocessed file, and then we can rebuild the > > > dictionary if files was changed. > > > > > > > No, it is not necessary - just there should be commands (functions) for > > preload dictiory and unload dictionary. > > Oh understood. Tomas suggested those commands too earlier. I'll > implement them. But I think it is better to track files modification time > too. Because now, without the patch, users don't have to call additional > commands to refresh their dictionaries, so without such tracking we'll > made dictionaries maintenance harder. > Postgres hasn't any subsystem based on modification time, so introduction this sensitivity, I don't see, practical. Regards Pavel > > -- > Arthur Zakirov > Postgres Professional: http://www.postgrespro.com > Russian Postgres Company >
Re: [PROPOSAL] Shared Ispell dictionaries
2018-03-07 13:43 GMT+01:00 Arthur Zakirov: > On Wed, Mar 07, 2018 at 01:02:07PM +0100, Pavel Stehule wrote: > > > Understand. I'm not againts the mmap() approach, just I have lack of > > > understanding mmap() benefits... Current shared Ispell approach > requires > > > preprocessing after server restarting, and the main advantage of mmap() > > > here > > > is that mmap() doesn't require preprocessing after restarting. > > > > > > Speaking about the implementation. > > > > > > It seems that the most appropriate place to store preprocessed files is > > > 'pg_dynshmem' folder. File prefix could be 'ts_dict.', otherwise > > > dsm_cleanup_for_mmap() will remove them. > > > > > > I'm not sure about reusing dsm_impl_mmap() and dsm_impl_windows(). But > > > maybe it's worth to reuse them. > > > > > > > I don't think so serialization to file (mmap) has not too sense. But the > > shared dictionary should loaded every time, and should be released every > > time if it is possible.Maybe there can be some background worker, that > > holds dictionary in memory. > > Do you mean that a shared dictionary should be reloaded if its .affix > and .dict files was changed? IMHO we can store last modification > timestamp of them in a preprocessed file, and then we can rebuild the > dictionary if files was changed. > No, it is not necessary - just there should be commands (functions) for preload dictiory and unload dictionary. > > -- > Arthur Zakirov > Postgres Professional: http://www.postgrespro.com > Russian Postgres Company >
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, Mar 07, 2018 at 10:55:29AM +0100, Tomas Vondra wrote: > On 03/07/2018 09:55 AM, Arthur Zakirov wrote: > > Hello Andres, > > > > On Thu, Mar 01, 2018 at 08:31:49PM -0800, Andres Freund wrote: > >> Is there any chance we can instead can convert dictionaries into a form > >> we can just mmap() into memory? That'd scale a lot higher and more > >> dynamicallly? > > > > To avoid misunderstanding can you please elaborate on using mmap()? The > > DSM approach looks like more simple and requires less code. Also DSM may > > use mmap() if I'm not mistaken. > > > > I think the mmap() idea is that you preprocess the dictionary, store the > result in a file, and then mmap it when needed, without the expensive > preprocessing. Understand. I'm not againts the mmap() approach, just I have lack of understanding mmap() benefits... Current shared Ispell approach requires preprocessing after server restarting, and the main advantage of mmap() here is that mmap() doesn't require preprocessing after restarting. Speaking about the implementation. It seems that the most appropriate place to store preprocessed files is 'pg_dynshmem' folder. File prefix could be 'ts_dict.', otherwise dsm_cleanup_for_mmap() will remove them. I'm not sure about reusing dsm_impl_mmap() and dsm_impl_windows(). But maybe it's worth to reuse them. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 03/07/2018 09:55 AM, Arthur Zakirov wrote: > Hello Andres, > > On Thu, Mar 01, 2018 at 08:31:49PM -0800, Andres Freund wrote: >> Is there any chance we can instead can convert dictionaries into a form >> we can just mmap() into memory? That'd scale a lot higher and more >> dynamicallly? > > To avoid misunderstanding can you please elaborate on using mmap()? The > DSM approach looks like more simple and requires less code. Also DSM may > use mmap() if I'm not mistaken. > I think the mmap() idea is that you preprocess the dictionary, store the result in a file, and then mmap it when needed, without the expensive preprocessing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, Feb 07, 2018 at 07:28:29PM +0300, Arthur Zakirov wrote: > Here is rebased version of the patch due to changes into dict_ispell.c. > The patch itself wasn't changed. Here is rebased version of the patch due to changes within pg_proc.h. I haven't implemented a mmap prototype yet, though. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 259a2d83b4..439d2cdf87 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1364,6 +1364,35 @@ include_dir 'conf.d' + + max_shared_dictionaries_size (integer) + + max_shared_dictionaries_size configuration parameter + + + + +Sets the maximum size of all text search dictionaries loaded into shared +memory. The default is 100 megabytes (100MB). This +parameter can only be set at server start. + + + +Currently controls only loading of Ispell +dictionaries (see ). +After compiling the dictionary it will be copied into shared memory. +Another backends on first use of the dictionary will use it from shared +memory, so it doesn't need to compile the dictionary second time. + + + +If total size of simultaneously loaded dictionaries reaches the maximum +allowed size then a new dictionary will be loaded into local memory of +a backend. + + + + huge_pages (enum) diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index 3a843512d1..b6aeae449b 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -39,6 +39,7 @@ #include "nodes/makefuncs.h" #include "parser/parse_func.h" #include "tsearch/ts_cache.h" +#include "tsearch/ts_shared.h" #include "tsearch/ts_utils.h" #include "utils/builtins.h" #include "utils/fmgroids.h" @@ -396,7 +397,8 @@ verify_dictoptions(Oid tmplId, List *dictoptions) * Call the init method and see if it complains. We don't worry about * it leaking memory, since our command will soon be over anyway. */ - (void) OidFunctionCall1(initmethod, PointerGetDatum(dictoptions)); + (void) OidFunctionCall2(initmethod, PointerGetDatum(dictoptions), + ObjectIdGetDatum(InvalidOid)); } ReleaseSysCache(tup); @@ -513,6 +515,8 @@ RemoveTSDictionaryById(Oid dictId) CatalogTupleDelete(relation, >t_self); + ts_dict_shmem_release(dictId); + ReleaseSysCache(tup); heap_close(relation, RowExclusiveLock); diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 0c86a581c0..c7dce8cac5 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -44,6 +44,7 @@ #include "storage/procsignal.h" #include "storage/sinvaladt.h" #include "storage/spin.h" +#include "tsearch/ts_shared.h" #include "utils/backend_random.h" #include "utils/snapmgr.h" @@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); size = add_size(size, BackendRandomShmemSize()); + size = add_size(size, TsearchShmemSize()); #ifdef EXEC_BACKEND size =
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, Thank you for your comments. On Thu, Mar 01, 2018 at 08:31:49PM -0800, Andres Freund wrote: > Hi, > > On 2018-02-07 19:28:29 +0300, Arthur Zakirov wrote: > > + { > > + {"max_shared_dictionaries_size", PGC_POSTMASTER, RESOURCES_MEM, > > + gettext_noop("Sets the maximum size of all text search > > dictionaries loaded into shared memory."), > > + gettext_noop("Currently controls only loading of Ispell > > dictionaries. " > > +"If total size of > > simultaneously loaded dictionaries " > > +"reaches the maximum allowed > > size then a new dictionary " > > +"will be loaded into local > > memory of a backend."), > > + GUC_UNIT_KB, > > + }, > > + _shared_dictionaries_size, > > + 100 * 1024, 0, MAX_KILOBYTES, > > + NULL, NULL, NULL > > + }, > > So this uses shared memory, allocated at server start? That doesn't > seem right. Wouldn't it make more sense to have a > 'num_shared_dictionaries' GUC, and then allocate them with dsm? Or even > better not have any such limit and us a dshash table to point to > individual loaded tables? The patch uses dsm and dshash table already. 'max_shared_dictionaries_size' GUC was introduced after discussion with Tomas [1]. To limit amount of memory consumed by loaded dictionaries and to prevent possible memory bloating. Its default value is 100MB. There was 'shared_dictionaries' GUC before, it was introduced because usual hash tables was used before, not dshash. I replaced usual hash tables by dshash, removed 'shared_dictionaries' and added 'max_shared_dictionaries_size'. > Is there any chance we can instead can convert dictionaries into a form > we can just mmap() into memory? That'd scale a lot higher and more > dynamicallly? I think new IspellDictData structure (in 0003-Store-ispell-structures-in-shmem-v5.patch) can be stored in a binary file and mapped into memory already. But mmap() is not used in this patch yet. I can do some experiments and make a prototype. 1 - https://www.postgresql.org/message-id/d12d9395-922c-64c9-c87d-dd0e1d31440e%402ndquadrant.com -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Hi, On 2018-02-07 19:28:29 +0300, Arthur Zakirov wrote: > + { > + {"max_shared_dictionaries_size", PGC_POSTMASTER, RESOURCES_MEM, > + gettext_noop("Sets the maximum size of all text search > dictionaries loaded into shared memory."), > + gettext_noop("Currently controls only loading of Ispell > dictionaries. " > + "If total size of > simultaneously loaded dictionaries " > + "reaches the maximum allowed > size then a new dictionary " > + "will be loaded into local > memory of a backend."), > + GUC_UNIT_KB, > + }, > + _shared_dictionaries_size, > + 100 * 1024, 0, MAX_KILOBYTES, > + NULL, NULL, NULL > + }, So this uses shared memory, allocated at server start? That doesn't seem right. Wouldn't it make more sense to have a 'num_shared_dictionaries' GUC, and then allocate them with dsm? Or even better not have any such limit and us a dshash table to point to individual loaded tables? Is there any chance we can instead can convert dictionaries into a form we can just mmap() into memory? That'd scale a lot higher and more dynamicallly? Regards, Andres
Re: [PROPOSAL] Shared Ispell dictionaries
On Thu, Jan 25, 2018 at 07:51:58PM +0300, Arthur Zakirov wrote: > Attached new version of the patch. Here is rebased version of the patch due to changes into dict_ispell.c. The patch itself wasn't changed. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index c45979dee4..725473b7c2 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1364,6 +1364,35 @@ include_dir 'conf.d' + + max_shared_dictionaries_size (integer) + + max_shared_dictionaries_size configuration parameter + + + + +Sets the maximum size of all text search dictionaries loaded into shared +memory. The default is 100 megabytes (100MB). This +parameter can only be set at server start. + + + +Currently controls only loading of Ispell +dictionaries (see ). +After compiling the dictionary it will be copied into shared memory. +Another backends on first use of the dictionary will use it from shared +memory, so it doesn't need to compile the dictionary second time. + + + +If total size of simultaneously loaded dictionaries reaches the maximum +allowed size then a new dictionary will be loaded into local memory of +a backend. + + + + huge_pages (enum) diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index 3a843512d1..b6aeae449b 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -39,6 +39,7 @@ #include "nodes/makefuncs.h" #include "parser/parse_func.h" #include "tsearch/ts_cache.h" +#include "tsearch/ts_shared.h" #include "tsearch/ts_utils.h" #include "utils/builtins.h" #include "utils/fmgroids.h" @@ -396,7 +397,8 @@ verify_dictoptions(Oid tmplId, List *dictoptions) * Call the init method and see if it complains. We don't worry about * it leaking memory, since our command will soon be over anyway. */ - (void) OidFunctionCall1(initmethod, PointerGetDatum(dictoptions)); + (void) OidFunctionCall2(initmethod, PointerGetDatum(dictoptions), + ObjectIdGetDatum(InvalidOid)); } ReleaseSysCache(tup); @@ -513,6 +515,8 @@ RemoveTSDictionaryById(Oid dictId) CatalogTupleDelete(relation, >t_self); + ts_dict_shmem_release(dictId); + ReleaseSysCache(tup); heap_close(relation, RowExclusiveLock); diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 0c86a581c0..c7dce8cac5 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -44,6 +44,7 @@ #include "storage/procsignal.h" #include "storage/sinvaladt.h" #include "storage/spin.h" +#include "tsearch/ts_shared.h" #include "utils/backend_random.h" #include "utils/snapmgr.h" @@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); size = add_size(size, BackendRandomShmemSize()); + size = add_size(size, TsearchShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif @@ -271,6 +273,11 @@
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, Thank you for your review! Good catches. On Thu, Jan 25, 2018 at 03:26:46PM +0300, Ildus Kurbangaliev wrote: > In 0001 there are few lines where is only indentation has changed. Fixed. > 0002: > - TsearchShmemSize - calculating size using hash_estimate_size seems > redundant since you use DSA hash now. Fixed. True, there is no need in hash_estimate_size anymore. > - ts_dict_shmem_release - LWLockAcquire in the beginning makes no > sense, since dict_table couldn't change anyway. Fixed. In earlier version tsearch_ctl was used here, but I forgot to remove LWLockAcquire. > 0003: > - ts_dict_shmem_location could return IspellDictData, it makes more > sense. I assume that ts_dict_shmem_location can be used by various types of dictionaries, not only by Ispell. So void * more suitable here. > 0006: > It's very subjective, but I think it would nicer to call option as > Shared (as property of dictionary) or UseSharedMemory, the boolean > option called SharedMemory sounds weird. Agree. In our offline conversation we came to Shareable, that is a dictionary can be shared. It may be more appropriate because setting Shareable=true doesn't guarantee that a dictionary will be allocated in shared memory due to max_shared_dictionaries_size GUC. Attached new version of the patch. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 45b2af14eb..46617df852 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1364,6 +1364,35 @@ include_dir 'conf.d' + + max_shared_dictionaries_size (integer) + + max_shared_dictionaries_size configuration parameter + + + + +Sets the maximum size of all text search dictionaries loaded into shared +memory. The default is 100 megabytes (100MB). This +parameter can only be set at server start. + + + +Currently controls only loading of Ispell +dictionaries (see ). +After compiling the dictionary it will be copied into shared memory. +Another backends on first use of the dictionary will use it from shared +memory, so it doesn't need to compile the dictionary second time. + + + +If total size of simultaneously loaded dictionaries reaches the maximum +allowed size then a new dictionary will be loaded into local memory of +a backend. + + + + huge_pages (enum) diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index bdf3857ce4..42be77d045 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -39,6 +39,7 @@ #include "nodes/makefuncs.h" #include "parser/parse_func.h" #include "tsearch/ts_cache.h" +#include "tsearch/ts_shared.h" #include "tsearch/ts_utils.h" #include "utils/builtins.h" #include "utils/fmgroids.h" @@ -396,7 +397,8 @@ verify_dictoptions(Oid tmplId, List *dictoptions) * Call the init method and see if it complains. We don't worry about * it leaking memory, since our command will soon be over anyway. */ - (void) OidFunctionCall1(initmethod, PointerGetDatum(dictoptions)); + (void) OidFunctionCall2(initmethod, PointerGetDatum(dictoptions), + ObjectIdGetDatum(InvalidOid)); }
Re: [PROPOSAL] Shared Ispell dictionaries
On Wed, 24 Jan 2018 20:20:41 +0300 Arthur Zakirovwrote: Hi, I did some review of the patch. In 0001 there are few lines where is only indentation has changed. 0002: - TsearchShmemSize - calculating size using hash_estimate_size seems redundant since you use DSA hash now. - ts_dict_shmem_release - LWLockAcquire in the beginning makes no sense, since dict_table couldn't change anyway. 0003: - ts_dict_shmem_location could return IspellDictData, it makes more sense. 0006: It's very subjective, but I think it would nicer to call option as Shared (as property of dictionary) or UseSharedMemory, the boolean option called SharedMemory sounds weird. Overall the patches look good, all tests passed. I tried to broke it in few places where I thought it could be unsafe but not succeeded. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
2018-01-24 20:57 GMT+03:00 Tomas Vondra: > > Thanks. I don't have time to review/test this before FOSDEM, but a > couple of comments regarding some of the points you mentioned. > Thank you for your thoughts. > > I thought about it. And it seems to me that we can use functions > > ts_unload() and ts_reload() instead of new syntax. We already have > > text search functions like ts_lexize() and ts_debug(), and it is > > better to keep consistency. > > This argument seems a bit strange. Both ts_lexize() and ts_debug() are > operating on text values, and are meant to be executed as functions from > SQL - particularly ts_lexize(). It's hard to imagine this implemented as > DDL commands. > > The unload/reload is something that operates on a database object > (dictionary), which already has create/drop/alter DDL. So it seems > somewhat natural to treat unload/reload as another DDL action. > > Taken to an extreme, this argument would essentially mean we should not > have any DDL commands because we have SQL functions. > > That being said, I'm not particularly attached to having this DDL now. > Implementing it seems straight-forward (particularly when we already > have the stuff implemented as functions), and some of the other open > questions seem more important to tackle now. > I understood your opinion. I haven't strong opinion on the subject yet. And I agree that they can be implemented in future improvements for shared dictionaries. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Hi, On 01/24/2018 06:20 PM, Arthur Zakirov wrote: > On Sat, Jan 13, 2018 at 06:22:41PM +0300, Arthur Zakirov wrote: >> I think your proposals may be implemented in several patches, so >> they can be applyed independently but consistently. I suppose I >> will prepare new version of the patch with fixes and with initial >> design of new functions and commands soon. > > I attached new version of the patch. > Thanks. I don't have time to review/test this before FOSDEM, but a couple of comments regarding some of the points you mentioned. >> 3) How do I unload a dictionary from the shared memory? >> ... >> ALTER TEXT SEARCH DICTIONARY x UNLOAD >> >> 4) How do I reload a dictionary? >> ... >> ALTER TEXT SEARCH DICTIONARY x RELOAD > > I thought about it. And it seems to me that we can use functions > ts_unload() and ts_reload() instead of new syntax. We already have > text search functions like ts_lexize() and ts_debug(), and it is > better to keep consistency. This argument seems a bit strange. Both ts_lexize() and ts_debug() are operating on text values, and are meant to be executed as functions from SQL - particularly ts_lexize(). It's hard to imagine this implemented as DDL commands. The unload/reload is something that operates on a database object (dictionary), which already has create/drop/alter DDL. So it seems somewhat natural to treat unload/reload as another DDL action. Taken to an extreme, this argument would essentially mean we should not have any DDL commands because we have SQL functions. That being said, I'm not particularly attached to having this DDL now. Implementing it seems straight-forward (particularly when we already have the stuff implemented as functions), and some of the other open questions seem more important to tackle now. > I think there are to approach for ts_unload():> - use DSM's pin and unpin > methods and the invalidation callback, as > it done during fixing memory leak. It has the drawback that it won't > have an immediate effect, because DSM will be released only when all > backends unpin DSM mapping. > - use DSA and dsa_free() method. As far as I understand dsa_free() > frees allocated memory immediatly. But it requires more work to do, > because we will need some more locks. For instance, what happens > when someone calls ts_lexize() and someone else calls dsa_free() at > the same time. No opinion on this yet, I have to think about it for a bit and look at the code first. >> 7) You mentioned you had to get rid of the compact_palloc0 - can you >> elaborate a bit why that was necessary? Also, when benchmarking the >> impact of this make sure to measure not only the time but also memory >> consumption. > > It seems to me that there is no need compact_palloc0() anymore. Tests > show that czech dictionary doesn't consume more memory after the > patch. > That's interesting. I'll do some additional tests to verify the finding. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On Sat, Jan 13, 2018 at 06:22:41PM +0300, Arthur Zakirov wrote: > I think your proposals may be implemented in several patches, so they can > be applyed independently but consistently. I suppose I will prepare new > version of the patch with fixes and with initial design of new functions > and commands soon. I attached new version of the patch. 0001-Fix-ispell-memory-handling-v3.patch: > allocate memory in the buildCxt. What about adding tmpstrdup to copy a > string into the context? I admit this is mostly nitpicking though. Fixed. Added tmpstrdup. 0002-Retreive-shmem-location-for-ispell-v3.patch: dshash.c is used now instead of dynahash.c. A hash table is created during first call of a text search function in an instance. A hash table uses OID of a dictionary instead of file names, so there is no cross-db sharing at all. Added max_shared_dictionaries_size GUC instead of shared_dictionaries. In current version it can be set only at server start. If a dictionary is allocated in a backend's memory instead of shared memory then LOG message will be raised which includes OID of the dictionary. Fixed memory leak. During removing a dictionary and invalidating dictionaries cash ts_dict_shmem_release() is called. It unpins mapping of a dictionary, if reference count reaches zero then DSM segment will be unpinned. So allocated shared memory will be released by Postgres. 0003-Store-ispell-structures-in-shmem-v3.patch: Added documentation fixes. dispell_init() (tmplinit too) has second argument, dictid. 0004-Update-tmplinit-arguments-v3.patch: It is necessary to fix all dictionaries including contrib extensions because of second argument for tmplinit. tmplinit has the following signature now: dict_init(internal, internal) 0005-pg-ts-shared-dictinaries-view-v3.patch: Added pg_ts_shared_dictionaries() function and pg_ts_shared_dictionaries system view. They return a list of dictionaries currently in shared memory, with the columns: - dictoid - schemaname - dictname - size 0006-Shared-memory-ispell-option-v3.patch: Added SharedMemory option for Ispell dictionary template. It is true by default, because I think it would be good that people will haven't to do anything to allocate dictionaries in shared memory. Setting SharedMemory=false during ALTER TEXT SEARCH DICTIONARY hasn't immediate effect. It is because ALTER doesn't force to invalidate dictionaries cache, if I'm not mistaken. > 3) How do I unload a dictionary from the shared memory? > ... > ALTER TEXT SEARCH DICTIONARY x UNLOAD > > 4) How do I reload a dictionary? > ... > ALTER TEXT SEARCH DICTIONARY x RELOAD I thought about it. And it seems to me that we can use functions ts_unload() and ts_reload() instead of new syntax. We already have text search functions like ts_lexize() and ts_debug(), and it is better to keep consistency. I think there are to approach for ts_unload(): - use DSM's pin and unpin methods and the invalidation callback, as it done during fixing memory leak. It has the drawback that it won't have an immediate effect, because DSM will be released only when all backends unpin DSM mapping. - use DSA and dsa_free() method. As far as I understand dsa_free() frees allocated memory immediatly. But it requires more work to do, because we will need some more locks. For instance, what happens when someone calls ts_lexize() and someone else calls dsa_free() at the same time. > 7) You mentioned you had to get rid of the compact_palloc0 - can you > elaborate a bit why that was necessary? Also, when benchmarking the > impact of this make sure to measure not only the time but also memory > consumption. It seems to me that there is no need compact_palloc0() anymore. Tests show that czech dictionary doesn't consume more memory after the patch. Tests - I've measured creation time of dictionaries on my 64-bit machine. You can get them from [1]. Here the master is 434e6e1484418c55561914600de9e180fc408378. I've measured french dictionary too because it has even bigger affix file than czech dictionary. With patch: czech_hunspell - 247 ms english_hunspell - 59 ms french_hunspell - 103 ms Master: czech_hunspell - 224 ms english_hunspell - 52 ms french_hunspell - 101 ms Memory: With patch (shared memory size + backend's memory): czech_hunspell - 9573049 + 192584 total in 5 blocks; 1896 free (11 chunks); 190688 used english_hunspell - 1985299 + 21064 total in 6 blocks; 7736 free (13 chunks); 13328 used french_hunspell - 4763456 + 626960 total in 7 blocks; 7680 free (14 chunks); 619280 used Here french dictionary uses more backend's memory because it has big affix file. Regular expression structures are stored in backend's memory still. Master (backend's memory): czech_hunspell - 17181544 total in 2034 blocks; 3584 free (10 chunks); 17177960 used english_hunspell - 4160120 total in 506 blocks; 2792 free (10 chunks); 4157328 used french_hunspell - 11439184 total in 1187 blocks; 18832 free (171
Re: [PROPOSAL] Shared Ispell dictionaries
On 01/15/2018 08:02 PM, Arthur Zakirov wrote: > On Sat, Jan 13, 2018 at 10:33:14PM +0100, Tomas Vondra wrote: >> Not sure if we really need to add the database/schema OIDs. I mentioned >> the unexpected consequences (cross-db sharing) but maybe that's a >> feature we should keep (it reduces memory usage). So perhaps this should >> be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the >> dictionary with other databases? >> >> Aren't we overengineering this? > > Another related problem I've noticed is memory leak. When a > dictionary loaded and then dropped it won't be unloaded. > Good point. > I see several approaches: >> 1 - Use Oid of the dictionary itself as the key instead dictfile and > afffile. When the dictionary is dropped it will be easily unloaded if it > was loaded. Implementing should be easy, but the drawback is more memory > consumption. > 2 - Use reference counter with cross-db sharing. When the dictionary is > loaded the counter increases. If all record of loaded dictionary is dropped > it will be unloaded. > 3 - Or reference counters without cross-db sharing to avoid possible > confusing. > Here dictfile, afffile and database Oid will be used as the key. > I think you're approaching the problem from the right direction, hence asking the wrong question. I think the primary question is "Do we want to share dictionaries cross databases?" and the answer will determine which of the tree options is the right one. Another important consideration is the complexity of the patch. In fact, I suggest to make it your goal to make the initial patch as simple as possible. If something is "nice to have" it may wait for v2. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On Sat, Jan 13, 2018 at 10:33:14PM +0100, Tomas Vondra wrote: > Not sure if we really need to add the database/schema OIDs. I mentioned > the unexpected consequences (cross-db sharing) but maybe that's a > feature we should keep (it reduces memory usage). So perhaps this should > be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the > dictionary with other databases? > > Aren't we overengineering this? Another related problem I've noticed is memory leak. When a dictionary loaded and then dropped it won't be unloaded. I see several approaches: 1 - Use Oid of the dictionary itself as the key instead dictfile and afffile. When the dictionary is dropped it will be easily unloaded if it was loaded. Implementing should be easy, but the drawback is more memory consumption. 2 - Use reference counter with cross-db sharing. When the dictionary is loaded the counter increases. If all record of loaded dictionary is dropped it will be unloaded. 3 - Or reference counters without cross-db sharing to avoid possible confusing. Here dictfile, afffile and database Oid will be used as the key. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
On 01/13/2018 04:22 PM, Arthur Zakirov wrote: > Hello, > > Thank you Tomas for your review. > > On Sat, Jan 13, 2018 at 03:25:55AM +0100, Tomas Vondra wrote: >> allocate memory in the buildCxt. What about adding tmpstrdup to copy a >> string into the context? I admit this is mostly nitpicking though. > > ... snip ...> >> 8) One more thing - I've noticed that the hash table uses this key: >> ... >> That is, full paths to the two files, and I'm not sure that's a very >> good idea. Firstly, it's a bit wasteful (1kB per path). But more >> importantly it means all dictionaries referencing the same files will >> share the same chunk of shared memory - not only within a single >> database, but across the whole cluster. That may lead to surprising >> behavior, because e.g. unloading a dictionary in one database will >> affect dictionaries in all other databases referencing the same files. > > Hm, indeed. It's worth to use only file names instead full paths. And > it is good idea to use more information besides file names. It can be > Oid of a database and Oid of a namespace maybe, because a > dictionary can be created in different schemas. > I doubt using filenames (without the directory paths) solves anything, really. The keys still have to be MAXPGPATH because someone could create very long filename. But I don't think memory consumption is such a big deal, really. With 1000 dictionaries it's still just ~2MB of data, which is negligible compared to the amount of memory saved by sharing the dictionaries. Not sure if we really need to add the database/schema OIDs. I mentioned the unexpected consequences (cross-db sharing) but maybe that's a feature we should keep (it reduces memory usage). So perhaps this should be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the dictionary with other databases? Aren't we overengineering this? > I think your proposals may be implemented in several patches, so they can > be applyed independently but consistently. I suppose I will prepare new > version of the patch with fixes and with initial design of new functions > and commands soon. > Yes, splitting patches into smaller, more focused bits is a good idea. BTW the current patch fails to document the dictionary sharing. It only mentions it when describing the shared_dictionaries GUC. IMHO the right place for additional details is https://www.postgresql.org/docs/10/static/textsearch-dictionaries.html regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
Hi Arthur, I've done some initial review of the patch today, and here are some thoughts: 0001-Fix-ispell-memory-handling-v2.patch This makes sense. The patch simply replaces two cpstrdup() calls with MemoryContextStrdup, but I see spell.c already has two macros to allocate memory in the buildCxt. What about adding tmpstrdup to copy a string into the context? I admit this is mostly nitpicking though. 0002-Retreive-shmem-location-for-ispell-v2.patch I think the GUC name should make it clear it's a maximum number of something, just like "max_parallel_workers" and other such GUCs. When I first saw "shared_dictionaries" in the patch I thought it's a list of dictionary names, or something like that. I have a bunch of additional design questions and proposals (not necessarily required for v1, but perhaps useful for shaping it). 1) Why do we actually need the limit? Is it really necessary / useful? When I wrote shared_ispell back in 2012, all we had were fixed segments allocated at start, and so similar limits were a built-in restriction. But after the DSM stuff was introduced I imagined it would not be necessary. I realize the current implementation requires that, because the hash table is still created in an old-style memory context (and only the dictionaries are in DSM segments). But that seems fairly straightforward to fix by maintaining the hash table in a separate DSM segment too. So lookup of the dictionary DSM would have to fist check what the current hash table segment is, and then continue as now. I'm not sure if dynahash can live in a DSM segment, but we already have a hash table that supports that in dshash.c (which is also concurrent, although I'm not sure if that's a major advantage for this use case). 2) Do we actually want/need some limits? Which ones? That is not to say we don't need/want some limits, but the current limit may not be the droid we're looking for, for a couple of reasons. Firstly, currently it only matters during startup, when the dynahash is created. So to change the limit (e.g. to increase it) you actually have to restart the database, which is obviously a major hassle. Secondly, dynahash tweaks the values to get proper behavior. For example it's not using the values directly but some higher value of 2^N form. Which means the limit may not enforced immediately when hitting the GUC, but unexpectedly somewhat later. And finally, I believe this is log-worthy - right now the dictionary load silently switches to backend memory (thus incurring all the parsing overhead). This certainly deserves at least a log message. Actually, I'm not sure "number of dictionaries" is a particularly useful limit in the first place - that's not a number I really care about. But I do care about amount of memory consumed by the loaded dictionaries. So I do suggest adding such "max memory for shared dictionaries" limit. I'm not sure we can enforce it strictly, because when deciding where to load the dict we haven't parsed it yet and so don't know how much memory will be required. But I believe a lazy check should be fine (load it, and if we exceeded the total memory disable loading additional ones). 3) How do I unload a dictionary from the shared memory? Assume we've reached the limit (it does not matter if it's the number of dictionaries or memory used by them). How do I resolve that without restarting the database? How do I unload a dictionary (which may be unused) from shared memory? ALTER TEXT SEARCH DICTIONARY x UNLOAD 4) How do I reload a dictionary? Assume I've updated the dictionary files (added new words into the files, or something like that). How do I reload the dictionary? Do I have to restart the server, DROP/CREATE everything again, or what? What about instead having something like this: ALTER TEXT SEARCH DICTIONARY x RELOAD 5) Actually, how do I list currently loaded dictionaries (and how much memory they use in the shared memory)? 6) What other restrictions would be useful? I think it should be possible to specify which ispell dictionaries may be loaded into shared memory, and which should be always loaded into local backend memory. That is, something like CREATE TEXT SEARCH DICTIONARY x ( TEMPLATE = ispell, DictFile = czech, AffFile = czech, StopWords = czech, SharedMemory = true/false (default: false) ); because otherwise the dictionaries will compete for shared memory, and it's unclear which of them will get loaded. For a server with a single application that may not be a huge issue, but think about servers shared by multiple applications, etc. In the extension this was achieved kinda explicitly by definition of a separate 'shared_ispell' template, but if you modify the current one that won't work, of course. 7) You mentioned you had to get rid of the compact_palloc0 - can you elaborate a bit why that was necessary? Also, when benchmarking the impact of this make sure to measure not only the
Re: [PROPOSAL] Shared Ispell dictionaries
Thank you for your answer. On Mon, Jan 08, 2018 at 06:12:37PM +0100, Tomas Vondra wrote: > > I believe Pavel was referring to this extension: > > https://github.com/tvondra/shared_ispell > Oh, understood. > I wasn't going to submit that as in-core solution, but I'm happy you're > making improvements in that direction. I'll take a look at your patch > shortly. > There is the second version of the patch. But I've noticed a performance regression in ts_lexize() and I will try to find where the overhead hides. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Hi Arthur, Sorry for the delay, I somehow missed this thread ... On 12/27/2017 10:20 AM, Arthur Zakirov wrote: > On Tue, Dec 26, 2017 at 07:03:48PM +0100, Pavel Stehule wrote: >> >> Tomas had some workable patches related to this topic >> > > Tomas, have you planned to propose it? > I believe Pavel was referring to this extension: https://github.com/tvondra/shared_ispell I wasn't going to submit that as in-core solution, but I'm happy you're making improvements in that direction. I'll take a look at your patch shortly. ragards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
On Sun, Dec 31, 2017 at 06:28:13PM +0300, Arthur Zakirov wrote: > > There are issues to do: > - add the GUC-variable for hash table limit > - fix bugs > - improve comments > - performance testing > Here is the second version of the patch. 0002-Retreive-shmem-location-for-ispell-v2.patch: Fixed some bugs and added the GUC variable "shared_dictionaries". Added documentation for it. I'm not sure about the order of configuration parameters in section "19.4.1. Memory". Now "shared_dictionaries" goes after "shared_buffers". Maybe it will be good to make a patch wich will sort parameters in alphabetical order? 0003-Store-ispell-structures-in-shmem-v2.patch: Fixed some bugs, regression tests pass now. I added more comments and fixed old. I also tested with Hunspell dictionaries [1]. They are good too. Results of performance testing of Ispell and Hunspell dictionaries will be ready soon. 1 - github.com/postgrespro/hunspell_dicts -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..25614f2d31 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -498,7 +498,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? MemoryContextStrdup(Conf->buildCxt, flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1040,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = MemoryContextStrdup(Conf->buildCxt, s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1536,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e4a01699e4..858423354e 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1355,6 +1355,36 @@ include_dir 'conf.d' + + shared_dictionaries (integer) + + shared_dictionaries configuration parameter + + + + +Sets the maximum number of text search dictionaries loaded into shared +memory. The default is 10 dictionaries. + + + +Currently controls only loading of Ispell +dictionaries (see ). +After compiling the dictionary it will be copied into shared memory. +Another backends on first use of the dictionary will use it from shared +memory, so it doesn't need to compile the dictionary second time. +DictFile and AffFile are used to +search the dictionary in shared memory. + + + +If the number of simultaneously loaded dictionaries reaches the maximum +allowed number then a new dictionary will be loaded into local memory of +a backend. + + + + huge_pages (enum) diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 0c86a581c0..c7dce8cac5 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -44,6 +44,7 @@ #include "storage/procsignal.h" #include "storage/sinvaladt.h" #include "storage/spin.h" +#include "tsearch/ts_shared.h" #include "utils/backend_random.h" #include "utils/snapmgr.h" @@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); size = add_size(size, BackendRandomShmemSize()); + size = add_size(size, TsearchShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif @@ -271,6 +273,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) AsyncShmemInit(); BackendRandomShmemInit(); + /* +* Set up shared memory to tsearch +*/ + TsearchShmemInit(); + #ifdef EXEC_BACKEND /* diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 71caac1a1f..2446db7266 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -520,6 +520,7 @@ RegisterLWLockTranches(void) "shared_tuplestore"); LWLockRegisterTranche(LWTRANCHE_TBM, "tbm"); LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append"); +
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, hackers, On Tue, Dec 26, 2017 at 07:48:27PM +0300, Arthur Zakirov wrote: > The patch will be ready and added into the 2018-03 commitfest. > I attached the patch itself. 0001-Fix-ispell-memory-handling.patch: Some strings are allocated via compact_palloc0(). But they are not persistent, so they should be allocated using temporary memory context. Also a couple strings are not released if .aff file had new format. 0002-Retreive-shmem-location-for-ispell.patch: Adds ispell_shmem_location() function which look for location for a dictionary using .dict and .aff file names. If the location haven't been allocated in DSM earlier, allocate it. Shared hash table is used here to search the location. Maximum number of elements of hash table is NUM_DICTIONARIES=20 now. It will be better to use a GUC-variable. Also if the number of elements reached the limit then it will be good to use backend's local memory instead of shared. 0003-Store-ispell-structures-in-shmem.patch: Introduces IspellDictBuild and IspellDictData structures, removes IspellDict structure. IspellDictBuild is used during building the dictionary, if it haven't been allocated in DSM earlier, within dispell_build() function. IspellDictBuild has a pointer to IspellDictData structure, which will be filled with persistent data. After building the dictionary IspellDictData is copied into DSM location and temporary data of IspellDictBuild is released. All prefix trees are stored as a flat array now. Those arrays are allocated and stored using NodeArray struct now. Required node can be retreied by node offset. AffixData and Affix arrays have additional offset array to retreive an element by index. Affix field (array of AFFIX) of IspellDictBuild is persistent data also. But it is constructed as a temporary array first, Affix array need to be sorted via qsort() within NISortAffixes(). So IspellDictData stores: - AffixData - array of strings, access via AffixDataOffset - Affix - array of AFFIX, access via AffixOffset - DictNodes, PrefixNodes, SuffixNodes - prefix trees as a plain array - CompoundAffix - array of CMPDAffix sequential access I had to remove compact_palloc0() added by Pavel in 3e5f9412d0a818be77c974e5af710928097b91f3. Ispell dictionary doesn't need such allocation anymore. It was used to allocate a little locations. I will definity check performance of Czech dictionary. There are issues to do: - add the GUC-variable for hash table limit - fix bugs - improve comments - performance testing -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index 9a09ffb20a..6617c2cf05 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -498,7 +498,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? MemoryContextStrdup(Conf->buildCxt, flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1040,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = MemoryContextStrdup(Conf->buildCxt, s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1536,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 2d1ed143e0..86a6df131b 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -44,6 +44,7 @@ #include "storage/procsignal.h" #include "storage/sinvaladt.h" #include "storage/spin.h" +#include "tsearch/ts_shared.h" #include "utils/backend_random.h" #include "utils/snapmgr.h" @@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); size = add_size(size, BackendRandomShmemSize()); + size = add_size(size, TsearchShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif @@ -271,6 +273,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) AsyncShmemInit(); BackendRandomShmemInit(); + /* +* Set up shared memory to tsearch +*/ + TsearchShmemInit(); + #ifdef EXEC_BACKEND /* diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index
Re: [PROPOSAL] Shared Ispell dictionaries
On Tue, Dec 26, 2017 at 07:03:48PM +0100, Pavel Stehule wrote: > > Tomas had some workable patches related to this topic > Tomas, have you planned to propose it? -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
Arthur Zakirov wrote: > On Tue, Dec 26, 2017 at 01:55:57PM -0300, Alvaro Herrera wrote: > > So what are you going to use instead? > > [ ... ] > > To allocate IspellDict in this case it is necessary to calculate needed > memory size. I think arrays mentioned above will be built first then > memcpy'ed into IspellDict, if it won't take much time. OK, that sounds sensible on first blush. If there are many processes concurrently doing text searches, then the amnount of memory saved may be large enough to justify the additional processing (moreso if it's just one more memcpy cycle). I hope that there is some way to cope with the ispell data changing underneath -- maybe you'll need some sort of RCU? > > So this will be a large patch not submitted to 2018-01? Depending on > > size/complexity I'm not sure it's OK to submit 2018-03 only -- it may be > > too late. > > Oh, I see. I try to prepare the patch while 2018-01 is open. It isn't necessary that the patch to present to 2018-01 is final and complete (so don't kill yourself to achieve that) -- a preliminary patch that reviewers can comment on is enough, as long as the final patch you present to 2018-03 is not *too* different. But any medium-large patch whose first post is to the last commitfest of a cycle is likely to be thrown out to the next cycle's first commitfest very quickly. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] Shared Ispell dictionaries
Thank you for your feedback. On Tue, Dec 26, 2017 at 01:55:57PM -0300, Alvaro Herrera wrote: > So what are you going to use instead? For example, AffixNode and AffixNodeData represent prefix tree of an affix list. They are accessed by Suffix and Prefix pointers of IspellDict struct now. Instead all affix nodes should be placed into an array and accessed by an offset. Suffix array goes first, Prefix array goes after. AffixNodeData will access to a child node by an offset too. AffixNodeData struct has the array of pointers to AFFIX struct. These array with all AFFIX data can be stored within AffixNodeData. Or AffixNodeData can have an array of indexes to a single AFFIX array, which stored within IspellDict before or after Suffix and Prefix. Same for prefix tree of a word list, represented by SPNode struct. It might by stored as an array after the Prefix array. AffixData and CompoundAffix arrays go after them. To allocate IspellDict in this case it is necessary to calculate needed memory size. I think arrays mentioned above will be built first then memcpy'ed into IspellDict, if it won't take much time. Hope it makes sense and is reasonable. > > So this will be a large patch not submitted to 2018-01? Depending on > size/complexity I'm not sure it's OK to submit 2018-03 only -- it may be > too late. > Oh, I see. I try to prepare the patch while 2018-01 is open. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [PROPOSAL] Shared Ispell dictionaries
2017-12-26 17:55 GMT+01:00 Alvaro Herrera: > Arthur Zakirov wrote: > > > Implementation > > -- > > > > It is necessary to change all structures related with IspellDict: > > SPNode, AffixNode, AFFIX, CMPDAffix, IspellDict itself. They all > > shouldn't use pointers for this reason. Others are used only during > > dictionary building. > > So what are you going to use instead? > > > It would be good to store in a shared memory StopList struct too. > > Sure (probably a separate patch though). > > > All fields of IspellDict struct, which are used only during dictionary > > building, will be move into new IspellDictBuild to decrease needed > > shared memory size. And they are going to be released by buildCxt. > > > > Each dictionary will be stored in its own dsm segment. > > All that sounds reasonable. > > > The patch will be ready and added into the 2018-03 commitfest. > > So this will be a large patch not submitted to 2018-01? Depending on > size/complexity I'm not sure it's OK to submit 2018-03 only -- it may be > too late. > Tomas had some workable patches related to this topic Regards Pavel > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > >
Re: [PROPOSAL] Shared Ispell dictionaries
Arthur Zakirov wrote: > Implementation > -- > > It is necessary to change all structures related with IspellDict: > SPNode, AffixNode, AFFIX, CMPDAffix, IspellDict itself. They all > shouldn't use pointers for this reason. Others are used only during > dictionary building. So what are you going to use instead? > It would be good to store in a shared memory StopList struct too. Sure (probably a separate patch though). > All fields of IspellDict struct, which are used only during dictionary > building, will be move into new IspellDictBuild to decrease needed > shared memory size. And they are going to be released by buildCxt. > > Each dictionary will be stored in its own dsm segment. All that sounds reasonable. > The patch will be ready and added into the 2018-03 commitfest. So this will be a large patch not submitted to 2018-01? Depending on size/complexity I'm not sure it's OK to submit 2018-03 only -- it may be too late. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services