subject:"\[PROPOSAL\] Shared Ispell dictionaries"

Re: [PROPOSAL] Shared Ispell dictionaries

2019-04-05 Thread Arthur Zakirov

On Fri, Apr 5, 2019 at 8:41 PM Alvaro Herrera  wrote:
>
> Is 0001 a bugfix?

Yep, it is rather a bugfix and can be applied independently.

The fix allocates temporary strings using temporary context
Conf->buildCxt.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [PROPOSAL] Shared Ispell dictionaries

2019-04-05 Thread Alvaro Herrera

Is 0001 a bugfix?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PROPOSAL] Shared Ispell dictionaries

2019-04-01 Thread Arthur Zakirov


On 25.02.2019 14:33, Arthur Zakirov wrote:
It seems to me Tom and Andres also vote for the mmap() approach. I think 
I need to look closely at the mmap().


I've labeled the patch as 'v13'.
Unfortunately I didn't come up with a new patch yet. So I marked the 
entry as "Returned with feedback" for now.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-25 Thread Arthur Zakirov


On 21.02.2019 19:13, Robert Haas wrote:

So I think it's better to have each backend locally make a decision
about when that particular backend no longer needs the dictionary, and
then let the system automatically clean up the ones that are needed by
nobody.


Yep, it wouldn't be hard to implement.


Perhaps a better approach still would be to do what Andres proposed
back in March:

#> Is there any chance we can instead can convert dictionaries into a form
#> we can just mmap() into memory?  That'd scale a lot higher and more
#> dynamicallly?

The current approach inherently involves double-buffering: you've got
the filesystem cache containing the data read from disk, and then the
DSM containing the converted form of the data.  Having something that
you could just mmap() would avoid that, plus it would become a lot
less critical to keep the mappings around.  You could probably just
have individual queries mmap() it for as long as they need it and then
tear out the mapping when they finish executing; keeping the mappings
across queries likely wouldn't be too important in this case.

The downside is that you'd probably need to teach resowner.c about
mappings created via mmap() so that you don't leak mappings on an
abort, but that's probably not a crazy difficult problem.


It seems to me Tom and Andres also vote for the mmap() approach. I think 
I need to look closely at the mmap().


I've labeled the patch as 'v13'.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-21 Thread Andres Freund




On February 21, 2019 10:08:00 AM PST, Tom Lane  wrote:
>Robert Haas  writes:
>> Perhaps a better approach still would be to do what Andres proposed
>> back in March:
>
>> #> Is there any chance we can instead can convert dictionaries into a
>form
>> #> we can just mmap() into memory?  That'd scale a lot higher and
>more
>> #> dynamicallly?
>
>That seems awfully attractive.  I was about to question whether we
>could
>assume that mmap() works everywhere, but it's required by SUSv2 ... and
>if anybody has anything sufficiently lame for it not to work, we could
>fall back on malloc-a-hunk-of-memory-and-read-in-the-file.
>
>We'd need a bunch of work to design a position-independent binary
>representation for dictionaries, and then some tool to produce disk
>files
>containing that, so this isn't exactly a quick route to a solution.
>On the other hand, it isn't sounding like the current patch is getting
>close to committable either.
>
>(Actually, I guess you need a PI representation of a dictionary to
>put it in a DSM either, so presumably that part of the work is
>done already; although we might also wish for architecture independence
>of the disk files, which we probably don't have right now.)

That's what I was pushing for ages ago...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-21 Thread Tom Lane

Robert Haas  writes:
> Perhaps a better approach still would be to do what Andres proposed
> back in March:

> #> Is there any chance we can instead can convert dictionaries into a form
> #> we can just mmap() into memory?  That'd scale a lot higher and more
> #> dynamicallly?

That seems awfully attractive.  I was about to question whether we could
assume that mmap() works everywhere, but it's required by SUSv2 ... and
if anybody has anything sufficiently lame for it not to work, we could
fall back on malloc-a-hunk-of-memory-and-read-in-the-file.

We'd need a bunch of work to design a position-independent binary
representation for dictionaries, and then some tool to produce disk files
containing that, so this isn't exactly a quick route to a solution.
On the other hand, it isn't sounding like the current patch is getting
close to committable either.

(Actually, I guess you need a PI representation of a dictionary to
put it in a DSM either, so presumably that part of the work is
done already; although we might also wish for architecture independence
of the disk files, which we probably don't have right now.)

regards, tom lane

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-21 Thread Robert Haas

On Thu, Feb 21, 2019 at 8:28 AM Arthur Zakirov  wrote:
> Your approach looks simpler. It is necessary just to periodically scan
> dictionaries' cache hash table and not call dsm_pin_segment() when a DSM
> segment initialized. It also means that a dictionary is loaded into DSM
> only while there is a backend which attached the dictionary's DSM.

Right.  I think that having a central facility that tries to decide
whether or not a dictionary should be kept in shared memory or not,
e.g. based on a cache size parameter, isn't likely to work well.  The
problem is that if we make a decision that a dictionary should be
evicted because it's causing us to exceed the cache size threshold,
then we have no way to implement that decision.  We can't force other
backends to remove the mapping immediately, nor can we really bound
the time before they respond to a request to unmap it.  They might be
in the middle of using it.

So I think it's better to have each backend locally make a decision
about when that particular backend no longer needs the dictionary, and
then let the system automatically clean up the ones that are needed by
nobody.

Perhaps a better approach still would be to do what Andres proposed
back in March:

#> Is there any chance we can instead can convert dictionaries into a form
#> we can just mmap() into memory?  That'd scale a lot higher and more
#> dynamicallly?

The current approach inherently involves double-buffering: you've got
the filesystem cache containing the data read from disk, and then the
DSM containing the converted form of the data.  Having something that
you could just mmap() would avoid that, plus it would become a lot
less critical to keep the mappings around.  You could probably just
have individual queries mmap() it for as long as they need it and then
tear out the mapping when they finish executing; keeping the mappings
across queries likely wouldn't be too important in this case.

The downside is that you'd probably need to teach resowner.c about
mappings created via mmap() so that you don't leak mappings on an
abort, but that's probably not a crazy difficult problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-21 Thread Arthur Zakirov

On 21.02.2019 15:45, Robert Haas wrote:

On Wed, Feb 20, 2019 at 9:33 AM Arthur Zakirov wrote:

I'm working on the (b) approach. I thought about a priority queue
structure. There no such ready structure within PostgreSQL sources
except binaryheap.c, but it isn't for concurrent algorithms.

I don't see why you need a priority queue or, really, any other fancy
data structure. It seems like all you need to do is somehow set it up
so that a backend which doesn't use a dictionary for a while will
dsm_detach() the segment. Eventually an unused dictionary will have
no remaining references and will go away.

Hm, I didn't think in this way. Agree that using a new data structure is
overengineering.

Now in the current patch all DSM segments are pinned (and therefore
dsm_pin_segment() is called). So a dictionary lives in shared memory
even if nobody have the reference to it.

I thought about periodically scanning the shared hash table and
unpinning old and unused dictionaries. But this approach needs
sequential scan facility for dshash. Happily there is the patch from
Kyotaro-san (the v16-0001-sequential-scan-for-dshash.patch part):

https://www.postgresql.org/message-id/20190221.160555.191280262.horiguchi.kyot...@lab.ntt.co.jp

Your approach looks simpler. It is necessary just to periodically scan
dictionaries' cache hash table and not call dsm_pin_segment() when a DSM
segment initialized. It also means that a dictionary is loaded into DSM
only while there is a backend which attached the dictionary's DSM.

98 matches

Mail list logo