Re: [PATCH v5] z3fold: add shrinker

2016-10-18 Thread Vitaly Wool
On Tue, Oct 18, 2016 at 7:35 PM, Dan Streetman  wrote:
> On Tue, Oct 18, 2016 at 12:26 PM, Vitaly Wool  wrote:
>> 18 окт. 2016 г. 18:29 пользователь "Dan Streetman" 
>> написал:
>>
>>
>>>
>>> On Tue, Oct 18, 2016 at 10:51 AM, Vitaly Wool 
>>> wrote:
>>> > On Tue, Oct 18, 2016 at 4:27 PM, Dan Streetman 
>>> > wrote:
>>> >> On Mon, Oct 17, 2016 at 10:45 PM, Vitaly Wool 
>>> >> wrote:
>>> >>> Hi Dan,
>>> >>>
>>> >>> On Tue, Oct 18, 2016 at 4:06 AM, Dan Streetman 
>>> >>> wrote:
>>>  On Sat, Oct 15, 2016 at 8:05 AM, Vitaly Wool 
>>>  wrote:
>>> > This patch implements shrinker for z3fold. This shrinker
>>> > implementation does not free up any pages directly but it allows
>>> > for a denser placement of compressed objects which results in
>>> > less actual pages consumed and higher compression ratio therefore.
>>> >
>>> > This update removes z3fold page compaction from the freeing path
>>> > since we can rely on shrinker to do the job. Also, a new flag
>>> > UNDER_COMPACTION is introduced to protect against two threads
>>> > trying to compact the same page.
>>> 
>>>  i'm completely unconvinced that this should be a shrinker.  The
>>>  alloc/free paths are much, much better suited to compacting a page
>>>  than a shrinker that must scan through all the unbuddied pages.  Why
>>>  not just improve compaction for the alloc/free paths?
>>> >>>
>>> >>> Basically the main reason is performance, I want to avoid compaction
>>> >>> on hot
>>> >>> paths as much as possible. This patchset brings both performance and
>>> >>> compression ratio gain, I'm not sure how to achieve that with
>>> >>> improving
>>> >>> compaction on alloc/free paths.
>>> >>
>>> >> It seems like a tradeoff of slight improvement in hot paths, for
>>> >> significant decrease in performance by adding a shrinker, which will
>>> >> do a lot of unnecessary scanning.  The alloc/free/unmap functions are
>>> >> working directly with the page at exactly the point where compaction
>>> >> is needed - when adding or removing a bud from the page.
>>> >
>>> > I can see that sometimes there are substantial amounts of pages that
>>> > are non-compactable synchronously due to the MIDDLE_CHUNK_MAPPED
>>> > bit set. Picking up those seems to be a good job for a shrinker, and
>>> > those
>>> > end up in the beginning of respective unbuddied lists, so the shrinker
>>> > is set
>>> > to find them. I can slightly optimize that by introducing a
>>> > COMPACT_DEFERRED flag or something like that to make shrinker find
>>> > those pages faster, would that make sense to you?
>>>
>>> Why not just compact the page in z3fold_unmap()?
>>
>> That would give a huge performance penalty (checked).
>
> my core concern with the shrinker is, it has no context about which
> pages need compacting and which don't, while the alloc/free/unmap
> functions do.  If the alloc/free/unmap fast paths are impacted too
> much by compacting directly, then yeah setting a flag for deferred
> action would be better than the shrinker just scanning all pages.
> However, in that the case, then a shrinker still seems unnecessary -
> all the pages that need compacting are pre-marked, there's no need to
> scan any more.  Isn't a simple workqueue to deferred-compact pages
> better?

Well yep, the more I think of that the more it seems a better fit. I'll test
that workqueue thing performance wise and get back with a new patch.

~vitaly


Re: [PATCH v5] z3fold: add shrinker

2016-10-18 Thread Dan Streetman
On Tue, Oct 18, 2016 at 12:26 PM, Vitaly Wool  wrote:
> 18 окт. 2016 г. 18:29 пользователь "Dan Streetman" 
> написал:
>
>
>>
>> On Tue, Oct 18, 2016 at 10:51 AM, Vitaly Wool 
>> wrote:
>> > On Tue, Oct 18, 2016 at 4:27 PM, Dan Streetman 
>> > wrote:
>> >> On Mon, Oct 17, 2016 at 10:45 PM, Vitaly Wool 
>> >> wrote:
>> >>> Hi Dan,
>> >>>
>> >>> On Tue, Oct 18, 2016 at 4:06 AM, Dan Streetman 
>> >>> wrote:
>>  On Sat, Oct 15, 2016 at 8:05 AM, Vitaly Wool 
>>  wrote:
>> > This patch implements shrinker for z3fold. This shrinker
>> > implementation does not free up any pages directly but it allows
>> > for a denser placement of compressed objects which results in
>> > less actual pages consumed and higher compression ratio therefore.
>> >
>> > This update removes z3fold page compaction from the freeing path
>> > since we can rely on shrinker to do the job. Also, a new flag
>> > UNDER_COMPACTION is introduced to protect against two threads
>> > trying to compact the same page.
>> 
>>  i'm completely unconvinced that this should be a shrinker.  The
>>  alloc/free paths are much, much better suited to compacting a page
>>  than a shrinker that must scan through all the unbuddied pages.  Why
>>  not just improve compaction for the alloc/free paths?
>> >>>
>> >>> Basically the main reason is performance, I want to avoid compaction
>> >>> on hot
>> >>> paths as much as possible. This patchset brings both performance and
>> >>> compression ratio gain, I'm not sure how to achieve that with
>> >>> improving
>> >>> compaction on alloc/free paths.
>> >>
>> >> It seems like a tradeoff of slight improvement in hot paths, for
>> >> significant decrease in performance by adding a shrinker, which will
>> >> do a lot of unnecessary scanning.  The alloc/free/unmap functions are
>> >> working directly with the page at exactly the point where compaction
>> >> is needed - when adding or removing a bud from the page.
>> >
>> > I can see that sometimes there are substantial amounts of pages that
>> > are non-compactable synchronously due to the MIDDLE_CHUNK_MAPPED
>> > bit set. Picking up those seems to be a good job for a shrinker, and
>> > those
>> > end up in the beginning of respective unbuddied lists, so the shrinker
>> > is set
>> > to find them. I can slightly optimize that by introducing a
>> > COMPACT_DEFERRED flag or something like that to make shrinker find
>> > those pages faster, would that make sense to you?
>>
>> Why not just compact the page in z3fold_unmap()?
>
> That would give a huge performance penalty (checked).

my core concern with the shrinker is, it has no context about which
pages need compacting and which don't, while the alloc/free/unmap
functions do.  If the alloc/free/unmap fast paths are impacted too
much by compacting directly, then yeah setting a flag for deferred
action would be better than the shrinker just scanning all pages.
However, in that the case, then a shrinker still seems unnecessary -
all the pages that need compacting are pre-marked, there's no need to
scan any more.  Isn't a simple workqueue to deferred-compact pages
better?

>
>> >> Sorry if I missed it in earlier emails, but have you done any
>> >> performance measurements comparing with/without the shrinker?  The
>> >> compression ratio gains may be possible with only the
>> >> z3fold_compact_page() improvements, and performance may be stable (or
>> >> better) with only a per-z3fold-page lock, instead of adding the
>> >> shrinker...?
>> >
>> > I'm running some tests with per-page locks now, but according to the
>> > previous measurements the shrinker version always wins on multi-core
>> > platforms.
>>
>> But that comparison is without taking the spinlock in map/unmap right?
>
> Right, but from the recent measurements it looks like per-page locks don't
> slow things down that much.
>
> ~vitaly