subject:"\[PATCH 2\/3\] writeback\: allow for dirty metadata accounting"

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-12 Thread Dave Chinner

On Mon, Sep 12, 2016 at 10:56:04AM -0400, Josef Bacik wrote:
> I think that looping through all the sb's in the system would be
> kinda shitty for this tho, we want the "get number of dirty pages"
> part to be relatively fast.  What if I do something like the
> shrinker_control only for dirty objects. So the fs registers some
> dirty_objects_control, we call into each of those and get the counts
> from that.  Does that sound less crappy?  Thanks,

Hmmm - just an off-the-wall thought on this

If you're going to do that, then why wouldn't you simply use a
"shrinker" to do the metadata writeback rather than having a hook to
count dirty objects to pass to some other writeback code that calls
a hook to write the metadata?

That way filesystems can also implement dirty accounting and
"writers" for each cache of objects they currently implement
shrinkers for. i.e. just expanding shrinkers to be able to "count
dirty objects" and "write dirty objects" so that we can tell
filesystems to write back all their different metadata caches
proportionally to the size of the page cache and it's dirty state.
The existing file data and inode writeback could then just be new
generic "superblock shrinker" operations, and the fs could have it's
own private metadata writeback similar to the private sb shrinker
callout we currently have...

And, in doing so, we might be able to completely hide memcg from the
writeback implementations similar to the way memcg is completely
hidden from the shrinker reclaim implementations...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-12 Thread Dave Chinner

On Mon, Sep 12, 2016 at 10:24:12AM -0400, Josef Bacik wrote:
> Dave your reply got eaten somewhere along the way for me, so all i
> have is this email.  I'm going to respond to your stuff here.

No worries, I'll do a 2-in-1 reply :P

> On 09/12/2016 03:34 AM, Jan Kara wrote:
> >On Mon 12-09-16 10:46:56, Dave Chinner wrote:
> >>On Fri, Sep 09, 2016 at 10:17:43AM +0200, Jan Kara wrote:
> >>>On Mon 22-08-16 13:35:01, Josef Bacik wrote:
> Provide a mechanism for file systems to indicate how much dirty metadata 
> they
> are holding.  This introduces a few things
> 
> 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
> 2) WB stat for dirty metadata.  This way we know if we need to try and 
> call into
> the file system to write out metadata.  This could potentially be used in 
> the
> future to make balancing of dirty pages smarter.
> >>>
> >>>So I'm curious about one thing: In the previous posting you have mentioned
> >>>that the main motivation for this work is to have a simple support for
> >>>sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
> >>>do the dirty accounting at page granularity. What are your plans to handle
> >>>this mismatch?
> >>>
> >>>The thing is you actually shouldn't miscount by too much as that could
> >>>upset some checks in mm checking how much dirty pages a node has directing
> >>>how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
> >>>should be actually used in the checks in node_limits_ok() or in
> >>>node_pagecache_reclaimable() at all because once you start accounting dirty
> >>>slab objects, you are really on a thin ice...
> >>
> >>The other thing I'm concerned about is that it's a btrfs-only thing,
> >>which means having dirty btrfs metadata on a system with different
> >>filesystems (e.g. btrfs root/home, XFS data) is going to affect how
> >>memory balance and throttling is run on other filesystems. i.e. it's
> >>going ot make a filesystem specific issue into a problem that
> >>affects global behaviour.
> >
> >Umm, I don't think it will be different than currently. Currently btrfs
> >dirty metadata is accounted as dirty page cache because they have this
> >virtual fs inode which owns all metadata pages.

Yes, so effectively they are treated the same as file data pages
w.r.t. throttling, writeback and reclaim

> >It is pretty similar to
> >e.g. ext2 where you have bdev inode which effectively owns all metadata
> >pages and these dirty pages account towards the dirty limits. For ext4
> >things are more complicated due to journaling and thus ext4 hides the fact
> >that a metadata page is dirty until the corresponding transaction is
> >committed.  But from that moment on dirty metadata is again just a dirty
> >pagecache page in the bdev inode.

Yeah, though those filesystems don't suffer from the uncontrolled
explosion of metadata that btrfs is suffering from, so simply
treating them as another dirty inode that needs flushing works just
fine.

> >So current Josef's patch just splits the counter in which btrfs metadata
> >pages would be accounted but effectively there should be no change in the
> >behavior.

Yup, I missed the addition to the node_pagecache_reclaimable() that
ensures reclaim sees the same number or dirty pages...

> >It is just a question whether this approach is workable in the
> >future when they'd like to track different objects than just pages in the
> >counter.

I don't think it can. Because the counters directly influences the
page lru reclaim scanning algorithms, it can only be used to
account for pages that are in the LRUs. Other objects like slab
objects need to be accounted for and reclaimed by the shrinker
infrastructure.

Accounting for metadata writeback is a different issue - it could
track slab objects if we wanted to, but the issue is that these are
often difficult to determine the amount of IO needed to clean them
so generic balancing is hard. (e.g. effect of inode write
clustering).

> +1 to what Jan said.  Btrfs's dirty metadata is always going to
> affect any other file systems in the system, no matter how we deal
> with it.  In fact it's worse with our btree_inode approach as the
> dirtied_when thing will likely screw somebody and make us skip
> writing out dirty metadata when we want to.

XFS takes care of metadata flushing with a periodic background work
controlled by /proc/sys/fs/xfs/xfssyncd_centisecs. We trigger both
background async inode reclaim and background dirty metadata
flushing from this (run on workqueues) if the system is idle or
hasn't had some other trigger fire to run these sooner.  It works
well enough that I can't remember the last time someone asked a
question about needing to tune this parameter, or had a problem that
required tuning it to fix

> At least with this
> framework in place we can start to make the throttling smarter, so
> say make us flush metadata if that is the bigger % of the dirty
>

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-12 Thread Josef Bacik


On 09/09/2016 04:17 AM, Jan Kara wrote:

On Mon 22-08-16 13:35:01, Josef Bacik wrote:

Provide a mechanism for file systems to indicate how much dirty metadata they
are holding.  This introduces a few things

1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
2) WB stat for dirty metadata.  This way we know if we need to try and call into
the file system to write out metadata.  This could potentially be used in the
future to make balancing of dirty pages smarter.


So I'm curious about one thing: In the previous posting you have mentioned
that the main motivation for this work is to have a simple support for
sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
do the dirty accounting at page granularity. What are your plans to handle
this mismatch?


We already track how much dirty metadata we have internally in btrfs, I 
envisioned the subpage blocksize guys just calling the accounting ever N objects 
that were dirited in order to keep the accounting correct.  This is not great, 
but it was better than the hoops we needed to jump through to deal with the 
btree_inode and subpagesize blocksizes.




The thing is you actually shouldn't miscount by too much as that could
upset some checks in mm checking how much dirty pages a node has directing
how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
should be actually used in the checks in node_limits_ok() or in
node_pagecache_reclaimable() at all because once you start accounting dirty
slab objects, you are really on a thin ice...


Agreed, this does get a bit ugly.




diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..d329f89 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
return global_node_page_state(NR_FILE_DIRTY) +
global_node_page_state(NR_UNSTABLE_NFS) +
+   global_node_page_state(NR_METADATA_DIRTY) +
get_nr_dirty_inodes();


With my question is also connected this - when we have NR_METADATA_DIRTY,
we could just account dirty inodes there and get rid of this
get_nr_dirty_inodes() hack...

But actually getting this to work right to be able to track dirty inodes would
be useful on its own - some throlling of creation of dirty inodes would be
useful for several filesystems (ext4, xfs, ...).


So I suppose what I could do is instead provide a callback for the vm to ask how 
many dirty objects we have in the file system, instead of adding another page 
counter.  That way the actual accounting is kept internal to the file system, 
and it gets rid of the weird mismatch when blocksize < pagesize.  Does that 
sound like a more acceptable approach?  Unfortunately I decided to do this work 
to make the blocksize < pagesize work easier, but then didn't actually think 
about how the accounting would interact with that case, because I'm an idiot.


I think that looping through all the sb's in the system would be kinda shitty 
for this tho, we want the "get number of dirty pages" part to be relatively 
fast.  What if I do something like the shrinker_control only for dirty objects. 
So the fs registers some dirty_objects_control, we call into each of those and 
get the counts from that.  Does that sound less crappy?  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-12 Thread Josef Bacik

Dave your reply got eaten somewhere along the way for me, so all i have is this 
email.  I'm going to respond to your stuff here.


On 09/12/2016 03:34 AM, Jan Kara wrote:

On Mon 12-09-16 10:46:56, Dave Chinner wrote:

On Fri, Sep 09, 2016 at 10:17:43AM +0200, Jan Kara wrote:

On Mon 22-08-16 13:35:01, Josef Bacik wrote:

Provide a mechanism for file systems to indicate how much dirty metadata they
are holding.  This introduces a few things

1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
2) WB stat for dirty metadata.  This way we know if we need to try and call into
the file system to write out metadata.  This could potentially be used in the
future to make balancing of dirty pages smarter.


So I'm curious about one thing: In the previous posting you have mentioned
that the main motivation for this work is to have a simple support for
sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
do the dirty accounting at page granularity. What are your plans to handle
this mismatch?

The thing is you actually shouldn't miscount by too much as that could
upset some checks in mm checking how much dirty pages a node has directing
how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
should be actually used in the checks in node_limits_ok() or in
node_pagecache_reclaimable() at all because once you start accounting dirty
slab objects, you are really on a thin ice...


The other thing I'm concerned about is that it's a btrfs-only thing,
which means having dirty btrfs metadata on a system with different
filesystems (e.g. btrfs root/home, XFS data) is going to affect how
memory balance and throttling is run on other filesystems. i.e. it's
going ot make a filesystem specific issue into a problem that
affects global behaviour.


Umm, I don't think it will be different than currently. Currently btrfs
dirty metadata is accounted as dirty page cache because they have this
virtual fs inode which owns all metadata pages. It is pretty similar to
e.g. ext2 where you have bdev inode which effectively owns all metadata
pages and these dirty pages account towards the dirty limits. For ext4
things are more complicated due to journaling and thus ext4 hides the fact
that a metadata page is dirty until the corresponding transaction is
committed.  But from that moment on dirty metadata is again just a dirty
pagecache page in the bdev inode.

So current Josef's patch just splits the counter in which btrfs metadata
pages would be accounted but effectively there should be no change in the
behavior. It is just a question whether this approach is workable in the
future when they'd like to track different objects than just pages in the
counter.


+1 to what Jan said.  Btrfs's dirty metadata is always going to affect any other 
file systems in the system, no matter how we deal with it.  In fact it's worse 
with our btree_inode approach as the dirtied_when thing will likely screw 
somebody and make us skip writing out dirty metadata when we want to.  At least 
with this framework in place we can start to make the throttling smarter, so say 
make us flush metadata if that is the bigger % of the dirty pages in the system. 
 All I do now is move the status quo around, we are no worse, and arguably 
better with these patches than we were without them.





diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..d329f89 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
return global_node_page_state(NR_FILE_DIRTY) +
global_node_page_state(NR_UNSTABLE_NFS) +
+   global_node_page_state(NR_METADATA_DIRTY) +
get_nr_dirty_inodes();


With my question is also connected this - when we have NR_METADATA_DIRTY,
we could just account dirty inodes there and get rid of this
get_nr_dirty_inodes() hack...


Accounting of dirty inodes would have to applied to every
filesystem before that could be done, but


Well, this particular hack of adding get_nr_dirty_inodes() into the result
of get_nr_dirty_pages() is there only so that we do writeback even if there
are only dirty inodes without dirty pages. Since XFS doesn't care about
writeback for dirty inodes, it would be fine regardless what we do here,
won't it?


But actually getting this to work right to be able to track dirty inodes would
be useful on its own - some throlling of creation of dirty inodes would be
useful for several filesystems (ext4, xfs, ...).


... this relies on the VFS being able to track and control all
dirtying of inodes and metadata.

Which, it should be noted, cannot be done unconditionally because
some filesystems /explicitly avoid/ dirtying VFS inodes for anything
other than dirty data and provide no mechanism to the VFS for
writeback inodes or their related metadata. e.g. XFS, where all
metadata changes are transactional and so all dirty inode tracking
and writeback control is

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-12 Thread Jan Kara

On Mon 12-09-16 10:46:56, Dave Chinner wrote:
> On Fri, Sep 09, 2016 at 10:17:43AM +0200, Jan Kara wrote:
> > On Mon 22-08-16 13:35:01, Josef Bacik wrote:
> > > Provide a mechanism for file systems to indicate how much dirty metadata 
> > > they
> > > are holding.  This introduces a few things
> > > 
> > > 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
> > > 2) WB stat for dirty metadata.  This way we know if we need to try and 
> > > call into
> > > the file system to write out metadata.  This could potentially be used in 
> > > the
> > > future to make balancing of dirty pages smarter.
> > 
> > So I'm curious about one thing: In the previous posting you have mentioned
> > that the main motivation for this work is to have a simple support for
> > sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
> > do the dirty accounting at page granularity. What are your plans to handle
> > this mismatch?
> > 
> > The thing is you actually shouldn't miscount by too much as that could
> > upset some checks in mm checking how much dirty pages a node has directing
> > how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
> > should be actually used in the checks in node_limits_ok() or in
> > node_pagecache_reclaimable() at all because once you start accounting dirty
> > slab objects, you are really on a thin ice...
> 
> The other thing I'm concerned about is that it's a btrfs-only thing,
> which means having dirty btrfs metadata on a system with different
> filesystems (e.g. btrfs root/home, XFS data) is going to affect how
> memory balance and throttling is run on other filesystems. i.e. it's
> going ot make a filesystem specific issue into a problem that
> affects global behaviour.

Umm, I don't think it will be different than currently. Currently btrfs
dirty metadata is accounted as dirty page cache because they have this
virtual fs inode which owns all metadata pages. It is pretty similar to
e.g. ext2 where you have bdev inode which effectively owns all metadata
pages and these dirty pages account towards the dirty limits. For ext4
things are more complicated due to journaling and thus ext4 hides the fact
that a metadata page is dirty until the corresponding transaction is
committed.  But from that moment on dirty metadata is again just a dirty
pagecache page in the bdev inode.

So current Josef's patch just splits the counter in which btrfs metadata
pages would be accounted but effectively there should be no change in the
behavior. It is just a question whether this approach is workable in the
future when they'd like to track different objects than just pages in the
counter.

> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index 56c8fda..d329f89 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
> > >  {
> > >   return global_node_page_state(NR_FILE_DIRTY) +
> > >   global_node_page_state(NR_UNSTABLE_NFS) +
> > > + global_node_page_state(NR_METADATA_DIRTY) +
> > >   get_nr_dirty_inodes();
> > 
> > With my question is also connected this - when we have NR_METADATA_DIRTY,
> > we could just account dirty inodes there and get rid of this
> > get_nr_dirty_inodes() hack...
> 
> Accounting of dirty inodes would have to applied to every
> filesystem before that could be done, but

Well, this particular hack of adding get_nr_dirty_inodes() into the result
of get_nr_dirty_pages() is there only so that we do writeback even if there
are only dirty inodes without dirty pages. Since XFS doesn't care about
writeback for dirty inodes, it would be fine regardless what we do here,
won't it?

> > But actually getting this to work right to be able to track dirty inodes 
> > would
> > be useful on its own - some throlling of creation of dirty inodes would be
> > useful for several filesystems (ext4, xfs, ...).
> 
> ... this relies on the VFS being able to track and control all
> dirtying of inodes and metadata.
> 
> Which, it should be noted, cannot be done unconditionally because
> some filesystems /explicitly avoid/ dirtying VFS inodes for anything
> other than dirty data and provide no mechanism to the VFS for
> writeback inodes or their related metadata. e.g. XFS, where all
> metadata changes are transactional and so all dirty inode tracking
> and writeback control is internal the to the XFS transaction
> subsystem.
> 
> Adding an external throttle to dirtying of metadata doesn't make any
> sense in this sort of architecture - in XFS we already have all the
> throttles and expedited writeback triggers integrated into the
> transaction subsystem (e.g transaction reservation limits, log space
> limits, periodic background writeback, memory reclaim triggers,
> etc). It's all so tightly integrated around the physical structure
> of the filesystem I can't see any way to sanely abstract it to work
> with a generic "dirty list"

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-11 Thread Dave Chinner

On Fri, Sep 09, 2016 at 10:17:43AM +0200, Jan Kara wrote:
> On Mon 22-08-16 13:35:01, Josef Bacik wrote:
> > Provide a mechanism for file systems to indicate how much dirty metadata 
> > they
> > are holding.  This introduces a few things
> > 
> > 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
> > 2) WB stat for dirty metadata.  This way we know if we need to try and call 
> > into
> > the file system to write out metadata.  This could potentially be used in 
> > the
> > future to make balancing of dirty pages smarter.
> 
> So I'm curious about one thing: In the previous posting you have mentioned
> that the main motivation for this work is to have a simple support for
> sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
> do the dirty accounting at page granularity. What are your plans to handle
> this mismatch?
> 
> The thing is you actually shouldn't miscount by too much as that could
> upset some checks in mm checking how much dirty pages a node has directing
> how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
> should be actually used in the checks in node_limits_ok() or in
> node_pagecache_reclaimable() at all because once you start accounting dirty
> slab objects, you are really on a thin ice...

The other thing I'm concerned about is that it's a btrfs-only thing,
which means having dirty btrfs metadata on a system with different
filesystems (e.g. btrfs root/home, XFS data) is going to affect how
memory balance and throttling is run on other filesystems. i.e. it's
going ot make a filesystem specific issue into a problem that
affects global behaviour.
> 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 56c8fda..d329f89 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
> >  {
> > return global_node_page_state(NR_FILE_DIRTY) +
> > global_node_page_state(NR_UNSTABLE_NFS) +
> > +   global_node_page_state(NR_METADATA_DIRTY) +
> > get_nr_dirty_inodes();
> 
> With my question is also connected this - when we have NR_METADATA_DIRTY,
> we could just account dirty inodes there and get rid of this
> get_nr_dirty_inodes() hack...

Accounting of dirty inodes would have to applied to every
filesystem before that could be done, but

> But actually getting this to work right to be able to track dirty inodes would
> be useful on its own - some throlling of creation of dirty inodes would be
> useful for several filesystems (ext4, xfs, ...).

... this relies on the VFS being able to track and control all
dirtying of inodes and metadata.

Which, it should be noted, cannot be done unconditionally because
some filesystems /explicitly avoid/ dirtying VFS inodes for anything
other than dirty data and provide no mechanism to the VFS for
writeback inodes or their related metadata. e.g. XFS, where all
metadata changes are transactional and so all dirty inode tracking
and writeback control is internal the to the XFS transaction
subsystem.

Adding an external throttle to dirtying of metadata doesn't make any
sense in this sort of architecture - in XFS we already have all the
throttles and expedited writeback triggers integrated into the
transaction subsystem (e.g transaction reservation limits, log space
limits, periodic background writeback, memory reclaim triggers,
etc). It's all so tightly integrated around the physical structure
of the filesystem I can't see any way to sanely abstract it to work
with a generic "dirty list" accounting and writeback engine at this
point...

I can see how tracking of information such as the global amount of
dirty metadata is useful for diagnostics, but I'm not convinced we
should be using it for globally scoped external control of deeply
integrated and highly specific internal filesystem functionality.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

2016-09-09 Thread Jan Kara

On Mon 22-08-16 13:35:01, Josef Bacik wrote:
> Provide a mechanism for file systems to indicate how much dirty metadata they
> are holding.  This introduces a few things
> 
> 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
> 2) WB stat for dirty metadata.  This way we know if we need to try and call 
> into
> the file system to write out metadata.  This could potentially be used in the
> future to make balancing of dirty pages smarter.

So I'm curious about one thing: In the previous posting you have mentioned
that the main motivation for this work is to have a simple support for
sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
do the dirty accounting at page granularity. What are your plans to handle
this mismatch?

The thing is you actually shouldn't miscount by too much as that could
upset some checks in mm checking how much dirty pages a node has directing
how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
should be actually used in the checks in node_limits_ok() or in
node_pagecache_reclaimable() at all because once you start accounting dirty
slab objects, you are really on a thin ice...

> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 56c8fda..d329f89 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
>  {
>   return global_node_page_state(NR_FILE_DIRTY) +
>   global_node_page_state(NR_UNSTABLE_NFS) +
> + global_node_page_state(NR_METADATA_DIRTY) +
>   get_nr_dirty_inodes();

With my question is also connected this - when we have NR_METADATA_DIRTY,
we could just account dirty inodes there and get rid of this
get_nr_dirty_inodes() hack...

But actually getting this to work right to be able to track dirty inodes would
be useful on its own - some throlling of creation of dirty inodes would be
useful for several filesystems (ext4, xfs, ...).

> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 121a6e3..6a52723 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -506,6 +506,7 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>   nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>   nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>   nr_pages += node_page_state(pgdat, NR_WRITEBACK);
> + nr_pages += node_page_state(pgdat, NR_METADATA_DIRTY);
>  
>   return nr_pages <= limit;
>  }
> @@ -1595,7 +1596,8 @@ static void balance_dirty_pages(struct bdi_writeback 
> *wb,
>* been flushed to permanent storage.
>*/
>   nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
> - global_node_page_state(NR_UNSTABLE_NFS);
> + global_node_page_state(NR_UNSTABLE_NFS) +
> + global_node_page_state(NR_METADATA_DIRTY);
>   gdtc->avail = global_dirtyable_memory();
>   gdtc->dirty = nr_reclaimable + 
> global_node_page_state(NR_WRITEBACK);
>  
> @@ -1935,7 +1937,8 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
>*/
>   gdtc->avail = global_dirtyable_memory();
>   gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
> -   global_node_page_state(NR_UNSTABLE_NFS);
> +   global_node_page_state(NR_UNSTABLE_NFS) +
> +   global_node_page_state(NR_METADATA_DIRTY);
>   domain_dirty_limits(gdtc);
>  
>   if (gdtc->dirty > gdtc->bg_thresh)
> @@ -2009,7 +2012,8 @@ void laptop_mode_timer_fn(unsigned long data)
>  {
>   struct request_queue *q = (struct request_queue *)data;
>   int nr_pages = global_node_page_state(NR_FILE_DIRTY) +
> - global_node_page_state(NR_UNSTABLE_NFS);
> + global_node_page_state(NR_UNSTABLE_NFS) +
> + global_node_page_state(NR_METADATA_DIRTY);
>   struct bdi_writeback *wb;
>  
>   /*
> @@ -2473,6 +2477,96 @@ void account_page_dirtied(struct page *page, struct 
> address_space *mapping)
>  EXPORT_SYMBOL(account_page_dirtied);
>  
>  /*
> + * account_metadata_dirtied
> + * @page - the page being dirited
> + * @bdi - the bdi that owns this page
> + *
> + * Do the dirty page accounting for metadata pages that aren't backed by an
> + * address_space.
> + */
> +void account_metadata_dirtied(struct page *page, struct backing_dev_info 
> *bdi)
> +{
> + unsigned long flags;
> +

A bdi_cap_account_dirty() check here and in following functions?

> + local_irq_save(flags);
> + __inc_node_page_state(page, NR_METADATA_DIRTY);
> + __inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> + __inc_node_page_state(page, NR_DIRTIED);
> + __inc_wb_stat(>wb, WB_RECLAIMABLE);
> + __inc_wb_stat(>wb, WB_DIRTIED);
> + __inc_wb_stat(>wb, WB_METADATA_DIRTY);
> + current->nr_dirtied++;
> + task_io_account_write(PAGE_SIZE);
> + this_cpu_inc(bdp_ratelimits);
> +

[PATCH 2/3] writeback: allow for dirty metadata accounting

2016-08-22 Thread Josef Bacik

Provide a mechanism for file systems to indicate how much dirty metadata they
are holding.  This introduces a few things

1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
2) WB stat for dirty metadata.  This way we know if we need to try and call into
the file system to write out metadata.  This could potentially be used in the
future to make balancing of dirty pages smarter.

Signed-off-by: Josef Bacik 
---
 arch/tile/mm/pgtable.c   |   3 +-
 drivers/base/node.c  |   2 +
 fs/fs-writeback.c|   1 +
 fs/proc/meminfo.c|   2 +
 include/linux/backing-dev-defs.h |   1 +
 include/linux/mm.h   |   7 +++
 include/linux/mmzone.h   |   1 +
 include/trace/events/writeback.h |   7 ++-
 mm/backing-dev.c |   2 +
 mm/page-writeback.c  | 100 +--
 mm/page_alloc.c  |   7 ++-
 mm/vmscan.c  |   3 +-
 12 files changed, 127 insertions(+), 9 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 7cc6ee7..9543468 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
 {
struct zone *zone;
 
-   pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu 
free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu 
swap:%lu\n",
+   pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu 
writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu 
bounce:%lu pagecache:%lu swap:%lu\n",
   (global_node_page_state(NR_ACTIVE_ANON) +
global_node_page_state(NR_ACTIVE_FILE)),
   (global_node_page_state(NR_INACTIVE_ANON) +
global_node_page_state(NR_INACTIVE_FILE)),
   global_node_page_state(NR_FILE_DIRTY),
+  global_node_page_state(NR_METADATA_DIRTY),
   global_node_page_state(NR_WRITEBACK),
   global_node_page_state(NR_UNSTABLE_NFS),
   global_page_state(NR_FREE_PAGES),
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..efc867b2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -99,6 +99,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
n += sprintf(buf + n,
   "Node %d Dirty:  %8lu kB\n"
+  "Node %d MetadataDirty:  %8lu kB\n"
   "Node %d Writeback:  %8lu kB\n"
   "Node %d FilePages:  %8lu kB\n"
   "Node %d Mapped: %8lu kB\n"
@@ -119,6 +120,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
,
   nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
+  nid, K(node_page_state(pgdat, NR_METADATA_DIRTY)),
   nid, K(node_page_state(pgdat, NR_WRITEBACK)),
   nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
   nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..d329f89 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
return global_node_page_state(NR_FILE_DIRTY) +
global_node_page_state(NR_UNSTABLE_NFS) +
+   global_node_page_state(NR_METADATA_DIRTY) +
get_nr_dirty_inodes();
 }
 
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 09e18fd..8ca094f 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -80,6 +80,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
"SwapTotal:  %8lu kB\n"
"SwapFree:   %8lu kB\n"
"Dirty:  %8lu kB\n"
+   "MetadataDirty:  %8lu kB\n"
"Writeback:  %8lu kB\n"
"AnonPages:  %8lu kB\n"
"Mapped: %8lu kB\n"
@@ -139,6 +140,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
K(i.totalswap),
K(i.freeswap),
K(global_node_page_state(NR_FILE_DIRTY)),
+   K(global_node_page_state(NR_METADATA_DIRTY)),
K(global_node_page_state(NR_WRITEBACK)),
K(global_node_page_state(NR_ANON_MAPPED)),
K(global_node_page_state(NR_FILE_MAPPED)),
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 3f10307..1200aae 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -34,6 +34,7 @@ typedef int (congested_fn)(void *, int);
 enum wb_stat_item {
WB_RECLAIMABLE,
WB_WRITEBACK,
+   WB_METADATA_DIRTY,
WB_DIRTIED,
WB_WRITTEN,
NR_WB_STAT_ITEMS
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..5a3f626 100644

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

Re: [PATCH 2/3] writeback: allow for dirty metadata accounting

[PATCH 2/3] writeback: allow for dirty metadata accounting

8 matches

Site Navigation

Mail list logo

Footer information