Re: [PATCH RFC 0/8] dcache: increase poison resistance

2020-12-16 Thread Junxiao Bi

Hi Konstantin,

How would you like to proceed with this patch set?

This patchset as it is already fixed the customer issue we faced, it 
will stop memory fragmentation causing by negative dentry and no 
performance regression through our test. In production workload, it is 
common that some app kept creating and removing tmp files, this will 
leave a lot of negative dentry over time, some time later, it will cause 
memory fragmentation and system run into memory compaction and not 
responsible. It will be good to push it to upstream merge. If you are 
busy, we can try push it again.


Thanks,

Junxiao.

On 12/14/20 3:10 PM, Junxiao Bi wrote:

On 12/13/20 11:43 PM, Konstantin Khlebnikov wrote:




On Sun, Dec 13, 2020 at 9:52 PM Junxiao Bi > wrote:


    On 12/11/20 11:32 PM, Konstantin Khlebnikov wrote:

    > On Thu, Dec 10, 2020 at 2:01 AM Junxiao Bi
    mailto:junxiao...@oracle.com>
    > >>
    wrote:
    >
    >     Hi Konstantin,
    >
    >     We tested this patch set recently and found it limiting 
negative

    >     dentry
    >     to a small part of total memory. And also we don't see any
    >     performance
    >     regression on it. Do you have any plan to integrate it into
    >     mainline? It
    >     will help a lot on memory fragmentation issue causing by
    dentry slab,
    >     there were a lot of customer cases where sys% was very high
    since
    >     most
    >     cpu were doing memory compaction, dentry slab was taking too
    much
    >     memory
    >     and nearly all dentry there were negative.
    >
    >
    > Right now I don't have any plans for this. I suspect such
    problems will
    > appear much more often since machines are getting bigger.
    > So, somebody will take care of it.
    We already had a lot of customer cases. It made no sense to leave so
    many negative dentry in the system, it caused memory fragmentation
    and
    not much benefit.


Dcache could grow so big only if the system lacks of memory pressure.

Simplest solution is a cronjob which provinces such pressure by
creating sparse file on disk-based fs and then reading it.
This should wash away all inactive caches with no IO and zero chance 
of oom.

Sound good, will try.


    >
    > First part which collects negative dentries at the end list of
    > siblings could be
    > done in a more obvious way by splitting the list in two.
    > But this touches much more code.
    That would add new field to dentry?


Yep. Decision is up to maintainers.

    >
    > Last patch isn't very rigid but does non-trivial changes.
    > Probably it's better to call some garbage collector thingy
    periodically.
    > Lru list needs pressure to age and reorder entries properly.

    Swap the negative dentry to the head of hash list when it get
    accessed?
    Extra ones can be easily trimmed when swapping, using GC is to 
reduce

    perf impact?


Reclaimer/shrinker scans denties in LRU lists, it's an another list.


Ah, you mean GC to reclaim from LRU list. I am not sure it could catch 
up the speed of negative dentry generating.


Thanks,

Junxiao.

My patch used order in hash lists is a very unusual way. Don't be 
confused.


There are four lists
parent - siblings
hashtable - hashchain
LRU
inode - alias


    Thanks,

    Junxioao.

    >
    > Gc could be off by default or thresholds set very high (50% of
    ram for
    > example).
    > Final setup could be left up to owners of large systems, which
    needs
    > fine tuning.



Re: [PATCH RFC 0/8] dcache: increase poison resistance

2020-12-14 Thread Junxiao Bi

On 12/13/20 11:43 PM, Konstantin Khlebnikov wrote:




On Sun, Dec 13, 2020 at 9:52 PM Junxiao Bi > wrote:


On 12/11/20 11:32 PM, Konstantin Khlebnikov wrote:

> On Thu, Dec 10, 2020 at 2:01 AM Junxiao Bi
mailto:junxiao...@oracle.com>
> >>
wrote:
>
>     Hi Konstantin,
>
>     We tested this patch set recently and found it limiting negative
>     dentry
>     to a small part of total memory. And also we don't see any
>     performance
>     regression on it. Do you have any plan to integrate it into
>     mainline? It
>     will help a lot on memory fragmentation issue causing by
dentry slab,
>     there were a lot of customer cases where sys% was very high
since
>     most
>     cpu were doing memory compaction, dentry slab was taking too
much
>     memory
>     and nearly all dentry there were negative.
>
>
> Right now I don't have any plans for this. I suspect such
problems will
> appear much more often since machines are getting bigger.
> So, somebody will take care of it.
We already had a lot of customer cases. It made no sense to leave so
many negative dentry in the system, it caused memory fragmentation
and
not much benefit.


Dcache could grow so big only if the system lacks of memory pressure.

Simplest solution is a cronjob which provinces such pressure by
creating sparse file on disk-based fs and then reading it.
This should wash away all inactive caches with no IO and zero chance 
of oom.

Sound good, will try.


>
> First part which collects negative dentries at the end list of
> siblings could be
> done in a more obvious way by splitting the list in two.
> But this touches much more code.
That would add new field to dentry?


Yep. Decision is up to maintainers.

>
> Last patch isn't very rigid but does non-trivial changes.
> Probably it's better to call some garbage collector thingy
periodically.
> Lru list needs pressure to age and reorder entries properly.

Swap the negative dentry to the head of hash list when it get
accessed?
Extra ones can be easily trimmed when swapping, using GC is to reduce
perf impact?


Reclaimer/shrinker scans denties in LRU lists, it's an another list.


Ah, you mean GC to reclaim from LRU list. I am not sure it could catch 
up the speed of negative dentry generating.


Thanks,

Junxiao.

My patch used order in hash lists is a very unusual way. Don't be 
confused.


There are four lists
parent - siblings
hashtable - hashchain
LRU
inode - alias


Thanks,

Junxioao.

>
> Gc could be off by default or thresholds set very high (50% of
ram for
> example).
> Final setup could be left up to owners of large systems, which
needs
> fine tuning.



Re: [PATCH RFC 0/8] dcache: increase poison resistance

2020-12-13 Thread Junxiao Bi

On 12/11/20 11:32 PM, Konstantin Khlebnikov wrote:

On Thu, Dec 10, 2020 at 2:01 AM Junxiao Bi > wrote:


Hi Konstantin,

We tested this patch set recently and found it limiting negative
dentry
to a small part of total memory. And also we don't see any
performance
regression on it. Do you have any plan to integrate it into
mainline? It
will help a lot on memory fragmentation issue causing by dentry slab,
there were a lot of customer cases where sys% was very high since
most
cpu were doing memory compaction, dentry slab was taking too much
memory
and nearly all dentry there were negative.


Right now I don't have any plans for this. I suspect such problems will
appear much more often since machines are getting bigger.
So, somebody will take care of it.
We already had a lot of customer cases. It made no sense to leave so 
many negative dentry in the system, it caused memory fragmentation and 
not much benefit.


First part which collects negative dentries at the end list of 
siblings could be

done in a more obvious way by splitting the list in two.
But this touches much more code.

That would add new field to dentry?


Last patch isn't very rigid but does non-trivial changes.
Probably it's better to call some garbage collector thingy periodically.
Lru list needs pressure to age and reorder entries properly.


Swap the negative dentry to the head of hash list when it get accessed? 
Extra ones can be easily trimmed when swapping, using GC is to reduce 
perf impact?


Thanks,

Junxioao.



Gc could be off by default or thresholds set very high (50% of ram for 
example).
Final setup could be left up to owners of large systems, which needs 
fine tuning.


Re: [PATCH RFC 0/8] dcache: increase poison resistance

2020-12-09 Thread Junxiao Bi

Hi Konstantin,

We tested this patch set recently and found it limiting negative dentry 
to a small part of total memory. And also we don't see any performance 
regression on it. Do you have any plan to integrate it into mainline? It 
will help a lot on memory fragmentation issue causing by dentry slab, 
there were a lot of customer cases where sys% was very high since most 
cpu were doing memory compaction, dentry slab was taking too much memory 
and nearly all dentry there were negative.


The following is test result we run on two types of servers, one is 256G 
memory with 24 CPUS and another is 3T memory with 384 CPUS. The test 
case is using a lot of processes to generate negative dentry in 
parallel, the following is the test result after 72 hours, the negative 
dentry number is stable around that number even running longer time. If 
without the patch set, in less than half an hour 197G was took by 
negative dentry on 256G system, in 1 day 2.4T was took on 3T system.


    neg-dentry-number    neg-dentry-mem-usage

256G 55259084 10.6G

3T 202306756 38.8G

For perf test, we run the following, and no regression found.

- create 1M negative dentry and then touch them to convert them to 
positive dentry


- create 10K/100K/1M files

- remove 10K/100K/1M files

- kernel compile

To verify the fsnotify fix, we used inotifywait to watch file 
create/open in some directory where there is a lot of negative dentry, 
without the patch set, the system will run into soft lockup, with it, no 
soft lockup.


We also try to defeat the limitation by making different processes 
generating negative dentry with the same naming way, that will make one 
negative dentry being accessed couple times around same time, 
DCACHE_REFERENCED will be set on it and then it can't be trimmed easily. 
We do see negative dentry will take all the memory slowly from one of 
our system with 120G memory, for above two system, we see the memory 
usage were increased, but still a small part of total memory. This looks 
ok, since the common negative dentry user case will be create some temp 
files and then remove it, it will be rare to access same negative dentry 
around same time.


Thanks,

Junxiao.


On 5/8/20 5:23 AM, Konstantin Khlebnikov wrote:

For most filesystems result of every negative lookup is cached, content of
directories is usually cached too. Production of negative dentries isn't
limited with disk speed. It's really easy to generate millions of them if
system has enough memory.

Getting this memory back ins't that easy because slab frees pages only when
all related objects are gone. While dcache shrinker works in LRU order.

Typical scenario is an idle system where some process periodically creates
temporary files and removes them. After some time, memory will be filled
with negative dentries for these random file names.

Simple lookup of random names also generates negative dentries very fast.
Constant flow of such negative denries drains all other inactive caches.

Negative dentries are linked into siblings list along with normal positive
dentries. Some operations walks dcache tree but looks only for positive
dentries: most important is fsnotify/inotify. Hordes of negative dentries
slow down these operations significantly.

Time of dentry lookup is usually unaffected because hash table grows along
with size of memory. Unless somebody especially crafts hash collisions.

This patch set solves all of these problems:

Move negative denries to the end of sliblings list, thus walkers could
skip them at first sight (patches 3-6).

Keep in dcache at most three unreferenced negative denties in row in each
hash bucket (patches 7-8).

---

Konstantin Khlebnikov (8):
   dcache: show count of hash buckets in sysctl fs.dentry-state
   selftests: add stress testing tool for dcache
   dcache: sweep cached negative dentries to the end of list of siblings
   fsnotify: stop walking child dentries if remaining tail is negative
   dcache: add action D_WALK_SKIP_SIBLINGS to d_walk()
   dcache: stop walking siblings if remaining dentries all negative
   dcache: push releasing dentry lock into sweep_negative
   dcache: prevent flooding with negative dentries


  fs/dcache.c   | 144 +++-
  fs/libfs.c|  10 +-
  fs/notify/fsnotify.c  |   6 +-
  include/linux/dcache.h|   6 +
  tools/testing/selftests/filesystems/Makefile  |   1 +
  .../selftests/filesystems/dcache_stress.c | 210 ++
  6 files changed, 370 insertions(+), 7 deletions(-)
  create mode 100644 tools/testing/selftests/filesystems/dcache_stress.c

--
Signature


[PATCH RFC 0/8] dcache: increase poison resistance

2020-05-08 Thread Konstantin Khlebnikov
For most filesystems result of every negative lookup is cached, content of
directories is usually cached too. Production of negative dentries isn't
limited with disk speed. It's really easy to generate millions of them if
system has enough memory.

Getting this memory back ins't that easy because slab frees pages only when
all related objects are gone. While dcache shrinker works in LRU order.

Typical scenario is an idle system where some process periodically creates
temporary files and removes them. After some time, memory will be filled
with negative dentries for these random file names.

Simple lookup of random names also generates negative dentries very fast.
Constant flow of such negative denries drains all other inactive caches.

Negative dentries are linked into siblings list along with normal positive
dentries. Some operations walks dcache tree but looks only for positive
dentries: most important is fsnotify/inotify. Hordes of negative dentries
slow down these operations significantly.

Time of dentry lookup is usually unaffected because hash table grows along
with size of memory. Unless somebody especially crafts hash collisions.

This patch set solves all of these problems:

Move negative denries to the end of sliblings list, thus walkers could
skip them at first sight (patches 3-6).

Keep in dcache at most three unreferenced negative denties in row in each
hash bucket (patches 7-8).

---

Konstantin Khlebnikov (8):
  dcache: show count of hash buckets in sysctl fs.dentry-state
  selftests: add stress testing tool for dcache
  dcache: sweep cached negative dentries to the end of list of siblings
  fsnotify: stop walking child dentries if remaining tail is negative
  dcache: add action D_WALK_SKIP_SIBLINGS to d_walk()
  dcache: stop walking siblings if remaining dentries all negative
  dcache: push releasing dentry lock into sweep_negative
  dcache: prevent flooding with negative dentries


 fs/dcache.c   | 144 +++-
 fs/libfs.c|  10 +-
 fs/notify/fsnotify.c  |   6 +-
 include/linux/dcache.h|   6 +
 tools/testing/selftests/filesystems/Makefile  |   1 +
 .../selftests/filesystems/dcache_stress.c | 210 ++
 6 files changed, 370 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/dcache_stress.c

--
Signature