Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-26 Thread Jerome Marchand
On 10/21/2015 04:39 PM, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
>> On Fri, 2 Oct 2015, Vlastimil Babka wrote:

>> As you acknowledge in the commit message, if a file of 100 pages
>> were copied to tmpfs, and 100 tasks map its full extent, but they
>> all mess around with the first 50 pages and take no interest in
>> the last 50, then it's quite likely that that last 50 will get
>> swapped out; then with your patch, 100 tasks are each shown as
>> using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at
> some point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).
> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects
> (see below). In that case showing zero is IMHO wrong and inconsistent
> with the anonymous private mappings.

I didn't understand the changelog that way an IMHO it's a pretty
specific interpretation. I've always understood memory accounting as
being primarily the answer to the question: how much resources a
process uses? I guess its meaning as been overloaded with corollaries
that are only true in the most simple non-shared cases, such as yours
or "how much memory would be freed if this process goes away?", but I
don't think it should ever be used as a definition.

I suppose the reason I didn't understand the changelog the way you
intended is because I think that sometimes it's correct to blame a
process for pages it never accessed (and I also believe that
over-accounting is better that under accounting,  but I must admit
that it is a quite arbitrary point of view). For instance, what if a
process has a shared anonymous mapping that includes pages that it
never accessed, but have been populated by an other process that has
already exited or munmaped the range? That process is not to blame for
the appearance of these pages, but it's the only reason why they
stay.

I'll offer a lollipop to anyone who comes up with a simple consistent
model on how to account shmem pages for all the possible cases, a
"Grand Unified Theory of Memory Accounting" so to speak.

> Other non-perfect solutions that come to mind:
> 
> 1) For private mappings, count only the swapents. "Swap:" is no longer
> showing full swap-in potential though.
> 2) For private mappings, do not count swapents. Ditto.
> 3) Provide two separate counters. The user won't know how much they
> overlap, though.
> 
> From these I would be inclined towards 3) as being more universal,
> although then it's no longer a simple "we're fixing a Swap: 0 value
> which is wrong", but closer to original Jerome's versions, which IIRC
> introduced several shmem-specific counters.

You remember correctly. Given all the controversy around shmem
accounting, maybe it would indeed be better to leave existing
counters, that are relatively well defined and understood, untouched
and add specific counters to mess around instead.

Thanks,
Jerome



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-26 Thread Jerome Marchand
On 10/21/2015 04:39 PM, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
>> On Fri, 2 Oct 2015, Vlastimil Babka wrote:

>> As you acknowledge in the commit message, if a file of 100 pages
>> were copied to tmpfs, and 100 tasks map its full extent, but they
>> all mess around with the first 50 pages and take no interest in
>> the last 50, then it's quite likely that that last 50 will get
>> swapped out; then with your patch, 100 tasks are each shown as
>> using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at
> some point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).
> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects
> (see below). In that case showing zero is IMHO wrong and inconsistent
> with the anonymous private mappings.

I didn't understand the changelog that way an IMHO it's a pretty
specific interpretation. I've always understood memory accounting as
being primarily the answer to the question: how much resources a
process uses? I guess its meaning as been overloaded with corollaries
that are only true in the most simple non-shared cases, such as yours
or "how much memory would be freed if this process goes away?", but I
don't think it should ever be used as a definition.

I suppose the reason I didn't understand the changelog the way you
intended is because I think that sometimes it's correct to blame a
process for pages it never accessed (and I also believe that
over-accounting is better that under accounting,  but I must admit
that it is a quite arbitrary point of view). For instance, what if a
process has a shared anonymous mapping that includes pages that it
never accessed, but have been populated by an other process that has
already exited or munmaped the range? That process is not to blame for
the appearance of these pages, but it's the only reason why they
stay.

I'll offer a lollipop to anyone who comes up with a simple consistent
model on how to account shmem pages for all the possible cases, a
"Grand Unified Theory of Memory Accounting" so to speak.

> Other non-perfect solutions that come to mind:
> 
> 1) For private mappings, count only the swapents. "Swap:" is no longer
> showing full swap-in potential though.
> 2) For private mappings, do not count swapents. Ditto.
> 3) Provide two separate counters. The user won't know how much they
> overlap, though.
> 
> From these I would be inclined towards 3) as being more universal,
> although then it's no longer a simple "we're fixing a Swap: 0 value
> which is wrong", but closer to original Jerome's versions, which IIRC
> introduced several shmem-specific counters.

You remember correctly. Given all the controversy around shmem
accounting, maybe it would indeed be better to leave existing
counters, that are relatively well defined and understood, untouched
and add specific counters to mess around instead.

Thanks,
Jerome



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-21 Thread Hugh Dickins
On Wed, 21 Oct 2015, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
> > On Fri, 2 Oct 2015, Vlastimil Babka wrote:
> > 
> > > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> > > mappings, even if the mapped portion does contain pages that were swapped
> > > out.
> > > This is because unlike private anonymous mappings, shmem does not change
> > > pte
> > > to swap entry, but pte_none when swapping the page out. In the smaps page
> > > walk, such page thus looks like it was never faulted in.
> > > 
> > > This patch changes smaps_pte_entry() to determine the swap status for
> > > such
> > > pte_none entries for shmem mappings, similarly to how mincore_page() does
> > > it.
> > > Swapped out pages are thus accounted for.
> > > 
> > > The accounting is arguably still not as precise as for private anonymous
> > > mappings, since now we will count also pages that the process in question
> > > never
> > > accessed, but only another process populated them and then let them
> > > become
> > > swapped out. I believe it is still less confusing and subtle than not
> > > showing
> > > any swap usage by shmem mappings at all. Also, swapped out pages only
> > > becomee a
> > > performance issue for future accesses, and we cannot predict those for
> > > neither
> > > kind of mapping.
> > > 
> > > Signed-off-by: Vlastimil Babka 
> > > Acked-by: Konstantin Khlebnikov 
> > 
> > Neither Ack nor Nack from me.
> > 
> > I don't want to stand in the way of this patch, if you and others
> > believe that it will help to diagnose problems in the field better
> > than what's shown at present; but to me it looks dangerously like
> > replacing no information by wrong information.
> > 
> > As you acknowledge in the commit message, if a file of 100 pages
> > were copied to tmpfs, and 100 tasks map its full extent, but they
> > all mess around with the first 50 pages and take no interest in
> > the last 50, then it's quite likely that that last 50 will get
> > swapped out; then with your patch, 100 tasks are each shown as
> > using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at some
> point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).

I see that as quite different myself; but agree that neither way
predicts the future.  Now, if you can make a patch to predict the future...

FWIW, I do seem to be looking at it more from a point of view of how
much swap the process is using, whereas you're looking at it more
from a point of view of what delays would be incurred in accessing.

> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects (see
> below). In that case showing zero is IMHO wrong and inconsistent with the
> anonymous private mappings.

Yes, your changelog is honest about the difference, I don't dispute that.
As I said, neither Ack nor Nack from me: I just don't feel in a position
to judge whether changing the output of smaps to please this customer is
likely to displease another customer or not.

> 
> > It is rather as if we didn't bother to record Rss, and just put
> > Size in there instead: you are (for understandable reasons) treating
> > the virtual address space as if every page of it had been touched.
> > 
> > But I accept that there may well be a class of processes and problems
> > which would be better served by this fiction than the present: I expect
> > you have much more experience of helping out in such situations than I.
> 
> Well, the customers driving this change would in the best case want to
> see the shmem swap accounted continuously and e.g. see it immediately in the
> top output. Fixing (IMHO) the smaps output is the next best thing. The use
> case here is a application that really doesn't like page faults, and has
> background thread that checks and prefaults such areas when they are expected
> to be used soon. So they would like to identify these areas.

And I guess I won't be able to sell mlock(2) to you :)

Still neither Ack nor Nack from me: while your number is more information
(or misinformation) than always 0, it's still not clear to me that it will
give them what they need.

...
> > 
> > And for private mappings of tmpfs files?  I expected it to show an
> > inderminate mixture of the two, but it looks like you treat the private
> > mapping just like a shared one, and take no notice of the COWed pages
> > out on swap which would have been reported before.  Oh, no, I think
> > I misread, and you add the two together?  I agree that's the easiest
> > thing to do, and therefore perhaps the best; but 

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-21 Thread Vlastimil Babka

On 10/05/2015 05:01 AM, Hugh Dickins wrote:

On Fri, 2 Oct 2015, Vlastimil Babka wrote:


Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka 
Acked-by: Konstantin Khlebnikov 


Neither Ack nor Nack from me.

I don't want to stand in the way of this patch, if you and others
believe that it will help to diagnose problems in the field better
than what's shown at present; but to me it looks dangerously like
replacing no information by wrong information.

As you acknowledge in the commit message, if a file of 100 pages
were copied to tmpfs, and 100 tasks map its full extent, but they
all mess around with the first 50 pages and take no interest in
the last 50, then it's quite likely that that last 50 will get
swapped out; then with your patch, 100 tasks are each shown as
using 50 pages of swap, when none of them are actually using any.


Yeah, but isn't it the same with private memory which was swapped out at 
some point and we don't know if it will be touched or not? The

difference is in private case we know the process touched it at least
once, but that can also mean nothing for the future (or maybe it just
mmapped with MAP_POPULATE and didn't care about half of it).

That's basically what I was trying to say in the changelog. I interpret
the Swap: value as the amount of swap-in potential, if the process was
going to access it, which is what the particular customer also expects 
(see below). In that case showing zero is IMHO wrong and inconsistent 
with the anonymous private mappings.



It is rather as if we didn't bother to record Rss, and just put
Size in there instead: you are (for understandable reasons) treating
the virtual address space as if every page of it had been touched.

But I accept that there may well be a class of processes and problems
which would be better served by this fiction than the present: I expect
you have much more experience of helping out in such situations than I.


Well, the customers driving this change would in the best case want to
see the shmem swap accounted continuously and e.g. see it immediately in 
the top output. Fixing (IMHO) the smaps output is the next best thing. 
The use case here is a application that really doesn't like page faults, 
and has background thread that checks and prefaults such areas when they 
are expected to be used soon. So they would like to identify these areas.



And perhaps you do balance it nicely by going to the opposite extreme
with SwapPss 0 for all (again for the eminently understandable reason,
that it would be a whole lot more new code to work out the right number).
Altogther, you're saying everyone's using more swap than they probably
are, but that's okay because it's infinitely shared.

I am not at all angling for you or anyone to make the changes necessary
to make those numbers accurate.  I think there's a point at which we
stop cluttering up the core kernel code, just for the sake of
maintaining numbers for a /proc file someone thought was a good idea
at the time.  But I am hoping that if this patch goes in, you will take
responsibility for batting away all the complaints that it doesn't work
as this or that person expected, rather than a long stream of patches
to refine it.


I don't plan to fix SwapPss, so I can drop the "currently".


I think the root problem is that we're trying to use /proc//smaps
for something that's independent of  and its maps: a shmem object.
Would we be better served by a tmpfs-ish filesystem mounted somewhere,
which gives names to all the objects on the internal mount of tmpfs
(SysV SHM, memfds etc); and some fincore-ish syscalls which could be
used to interrogate how much swap any tmpfs file is using in any range?
(I am not volunteering to write this, not in the foreseeable future.)

I have no idea of the security implications of naming the hidden, it
may be a non-starter.  And my guess is, it would be nice if it already
existed, but you need a solution today to some problems that have been
wasting 

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-21 Thread Vlastimil Babka

On 10/05/2015 05:01 AM, Hugh Dickins wrote:

On Fri, 2 Oct 2015, Vlastimil Babka wrote:


Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka 
Acked-by: Konstantin Khlebnikov 


Neither Ack nor Nack from me.

I don't want to stand in the way of this patch, if you and others
believe that it will help to diagnose problems in the field better
than what's shown at present; but to me it looks dangerously like
replacing no information by wrong information.

As you acknowledge in the commit message, if a file of 100 pages
were copied to tmpfs, and 100 tasks map its full extent, but they
all mess around with the first 50 pages and take no interest in
the last 50, then it's quite likely that that last 50 will get
swapped out; then with your patch, 100 tasks are each shown as
using 50 pages of swap, when none of them are actually using any.


Yeah, but isn't it the same with private memory which was swapped out at 
some point and we don't know if it will be touched or not? The

difference is in private case we know the process touched it at least
once, but that can also mean nothing for the future (or maybe it just
mmapped with MAP_POPULATE and didn't care about half of it).

That's basically what I was trying to say in the changelog. I interpret
the Swap: value as the amount of swap-in potential, if the process was
going to access it, which is what the particular customer also expects 
(see below). In that case showing zero is IMHO wrong and inconsistent 
with the anonymous private mappings.



It is rather as if we didn't bother to record Rss, and just put
Size in there instead: you are (for understandable reasons) treating
the virtual address space as if every page of it had been touched.

But I accept that there may well be a class of processes and problems
which would be better served by this fiction than the present: I expect
you have much more experience of helping out in such situations than I.


Well, the customers driving this change would in the best case want to
see the shmem swap accounted continuously and e.g. see it immediately in 
the top output. Fixing (IMHO) the smaps output is the next best thing. 
The use case here is a application that really doesn't like page faults, 
and has background thread that checks and prefaults such areas when they 
are expected to be used soon. So they would like to identify these areas.



And perhaps you do balance it nicely by going to the opposite extreme
with SwapPss 0 for all (again for the eminently understandable reason,
that it would be a whole lot more new code to work out the right number).
Altogther, you're saying everyone's using more swap than they probably
are, but that's okay because it's infinitely shared.

I am not at all angling for you or anyone to make the changes necessary
to make those numbers accurate.  I think there's a point at which we
stop cluttering up the core kernel code, just for the sake of
maintaining numbers for a /proc file someone thought was a good idea
at the time.  But I am hoping that if this patch goes in, you will take
responsibility for batting away all the complaints that it doesn't work
as this or that person expected, rather than a long stream of patches
to refine it.


I don't plan to fix SwapPss, so I can drop the "currently".


I think the root problem is that we're trying to use /proc//smaps
for something that's independent of  and its maps: a shmem object.
Would we be better served by a tmpfs-ish filesystem mounted somewhere,
which gives names to all the objects on the internal mount of tmpfs
(SysV SHM, memfds etc); and some fincore-ish syscalls which could be
used to interrogate how much swap any tmpfs file is using in any range?
(I am not volunteering to write this, not in the foreseeable future.)

I have no idea of the security implications of naming the hidden, it
may be a non-starter.  And my guess is, it would be nice if it already
existed, but you need a solution 

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-21 Thread Hugh Dickins
On Wed, 21 Oct 2015, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
> > On Fri, 2 Oct 2015, Vlastimil Babka wrote:
> > 
> > > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> > > mappings, even if the mapped portion does contain pages that were swapped
> > > out.
> > > This is because unlike private anonymous mappings, shmem does not change
> > > pte
> > > to swap entry, but pte_none when swapping the page out. In the smaps page
> > > walk, such page thus looks like it was never faulted in.
> > > 
> > > This patch changes smaps_pte_entry() to determine the swap status for
> > > such
> > > pte_none entries for shmem mappings, similarly to how mincore_page() does
> > > it.
> > > Swapped out pages are thus accounted for.
> > > 
> > > The accounting is arguably still not as precise as for private anonymous
> > > mappings, since now we will count also pages that the process in question
> > > never
> > > accessed, but only another process populated them and then let them
> > > become
> > > swapped out. I believe it is still less confusing and subtle than not
> > > showing
> > > any swap usage by shmem mappings at all. Also, swapped out pages only
> > > becomee a
> > > performance issue for future accesses, and we cannot predict those for
> > > neither
> > > kind of mapping.
> > > 
> > > Signed-off-by: Vlastimil Babka 
> > > Acked-by: Konstantin Khlebnikov 
> > 
> > Neither Ack nor Nack from me.
> > 
> > I don't want to stand in the way of this patch, if you and others
> > believe that it will help to diagnose problems in the field better
> > than what's shown at present; but to me it looks dangerously like
> > replacing no information by wrong information.
> > 
> > As you acknowledge in the commit message, if a file of 100 pages
> > were copied to tmpfs, and 100 tasks map its full extent, but they
> > all mess around with the first 50 pages and take no interest in
> > the last 50, then it's quite likely that that last 50 will get
> > swapped out; then with your patch, 100 tasks are each shown as
> > using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at some
> point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).

I see that as quite different myself; but agree that neither way
predicts the future.  Now, if you can make a patch to predict the future...

FWIW, I do seem to be looking at it more from a point of view of how
much swap the process is using, whereas you're looking at it more
from a point of view of what delays would be incurred in accessing.

> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects (see
> below). In that case showing zero is IMHO wrong and inconsistent with the
> anonymous private mappings.

Yes, your changelog is honest about the difference, I don't dispute that.
As I said, neither Ack nor Nack from me: I just don't feel in a position
to judge whether changing the output of smaps to please this customer is
likely to displease another customer or not.

> 
> > It is rather as if we didn't bother to record Rss, and just put
> > Size in there instead: you are (for understandable reasons) treating
> > the virtual address space as if every page of it had been touched.
> > 
> > But I accept that there may well be a class of processes and problems
> > which would be better served by this fiction than the present: I expect
> > you have much more experience of helping out in such situations than I.
> 
> Well, the customers driving this change would in the best case want to
> see the shmem swap accounted continuously and e.g. see it immediately in the
> top output. Fixing (IMHO) the smaps output is the next best thing. The use
> case here is a application that really doesn't like page faults, and has
> background thread that checks and prefaults such areas when they are expected
> to be used soon. So they would like to identify these areas.

And I guess I won't be able to sell mlock(2) to you :)

Still neither Ack nor Nack from me: while your number is more information
(or misinformation) than always 0, it's still not clear to me that it will
give them what they need.

...
> > 
> > And for private mappings of tmpfs files?  I expected it to show an
> > inderminate mixture of the two, but it looks like you treat the private
> > mapping just like a shared one, and take no notice of the COWed pages
> > out on swap which would have been reported before.  Oh, no, I think
> > I misread, and you add the two together?  I agree that's the easiest
> > thing 

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-06 Thread Vlastimil Babka

On 10/03/2015 12:37 AM, Andrew Morton wrote:

On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka  wrote:



--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
address_space *mapping,
  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
  extern int shmem_unuse(swp_entry_t entry, struct page *page);

+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+   pgoff_t start, pgoff_t end);
+#endif


CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.


Yeah, I overlooked this while removing the other ifdefs. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-06 Thread Vlastimil Babka

On 10/03/2015 12:37 AM, Andrew Morton wrote:

On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka  wrote:



--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
address_space *mapping,
  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
  extern int shmem_unuse(swp_entry_t entry, struct page *page);

+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+   pgoff_t start, pgoff_t end);
+#endif


CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.


Yeah, I overlooked this while removing the other ifdefs. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-05 Thread Peter Zijlstra
On Fri, Oct 02, 2015 at 03:35:49PM +0200, Vlastimil Babka wrote:
> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> + struct inode *inode;
> + unsigned long swapped;
> + pgoff_t start, end;
> +
> + if (!vma->vm_file)
> + return 0;
> +
> + inode = file_inode(vma->vm_file);
> +
> + if (!shmem_mapping(inode->i_mapping))
> + return 0;
> +
> + /*
> +  * The easier cases are when the shmem object has nothing in swap, or
> +  * we have the whole object mapped. Then we can simply use the stats
> +  * that are already tracked by shmem.
> +  */
> + swapped = shmem_swap_usage(inode);
> +
> + if (swapped == 0)
> + return 0;
> +
> + if (vma->vm_end - vma->vm_start >= inode->i_size)
> + return swapped;
> +
> + /*
> +  * Here we have to inspect individual pages in our mapped range to
> +  * determine how much of them are swapped out. Thanks to RCU, we don't
> +  * need i_mutex to protect against truncating or hole punching.
> +  */

At the very least put in an assertion that we hold the RCU read lock,
otherwise RCU doesn't guarantee anything and its not obvious it is held
here.

> + start = linear_page_index(vma, vma->vm_start);
> + end = linear_page_index(vma, vma->vm_end);
> +
> + return shmem_partial_swap_usage(inode->i_mapping, start, end);
> +}

> + * Determine (in bytes) how much of the whole shmem object is swapped out.
> + */
> +unsigned long shmem_swap_usage(struct inode *inode)
> +{
> + struct shmem_inode_info *info = SHMEM_I(inode);
> + unsigned long swapped;
> +
> + /* Mostly an overkill, but it's not atomic64_t */

Yeah, that don't make any kind of sense.

> + spin_lock(>lock);
> + swapped = info->swapped;
> + spin_unlock(>lock);
> +
> + return swapped << PAGE_SHIFT;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-05 Thread Peter Zijlstra
On Fri, Oct 02, 2015 at 03:35:49PM +0200, Vlastimil Babka wrote:
> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> + struct inode *inode;
> + unsigned long swapped;
> + pgoff_t start, end;
> +
> + if (!vma->vm_file)
> + return 0;
> +
> + inode = file_inode(vma->vm_file);
> +
> + if (!shmem_mapping(inode->i_mapping))
> + return 0;
> +
> + /*
> +  * The easier cases are when the shmem object has nothing in swap, or
> +  * we have the whole object mapped. Then we can simply use the stats
> +  * that are already tracked by shmem.
> +  */
> + swapped = shmem_swap_usage(inode);
> +
> + if (swapped == 0)
> + return 0;
> +
> + if (vma->vm_end - vma->vm_start >= inode->i_size)
> + return swapped;
> +
> + /*
> +  * Here we have to inspect individual pages in our mapped range to
> +  * determine how much of them are swapped out. Thanks to RCU, we don't
> +  * need i_mutex to protect against truncating or hole punching.
> +  */

At the very least put in an assertion that we hold the RCU read lock,
otherwise RCU doesn't guarantee anything and its not obvious it is held
here.

> + start = linear_page_index(vma, vma->vm_start);
> + end = linear_page_index(vma, vma->vm_end);
> +
> + return shmem_partial_swap_usage(inode->i_mapping, start, end);
> +}

> + * Determine (in bytes) how much of the whole shmem object is swapped out.
> + */
> +unsigned long shmem_swap_usage(struct inode *inode)
> +{
> + struct shmem_inode_info *info = SHMEM_I(inode);
> + unsigned long swapped;
> +
> + /* Mostly an overkill, but it's not atomic64_t */

Yeah, that don't make any kind of sense.

> + spin_lock(>lock);
> + swapped = info->swapped;
> + spin_unlock(>lock);
> +
> + return swapped << PAGE_SHIFT;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-04 Thread Hugh Dickins
On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Konstantin Khlebnikov 

Neither Ack nor Nack from me.

I don't want to stand in the way of this patch, if you and others
believe that it will help to diagnose problems in the field better
than what's shown at present; but to me it looks dangerously like
replacing no information by wrong information.

As you acknowledge in the commit message, if a file of 100 pages
were copied to tmpfs, and 100 tasks map its full extent, but they
all mess around with the first 50 pages and take no interest in
the last 50, then it's quite likely that that last 50 will get
swapped out; then with your patch, 100 tasks are each shown as
using 50 pages of swap, when none of them are actually using any.

It is rather as if we didn't bother to record Rss, and just put
Size in there instead: you are (for understandable reasons) treating
the virtual address space as if every page of it had been touched.

But I accept that there may well be a class of processes and problems
which would be better served by this fiction than the present: I expect
you have much more experience of helping out in such situations than I.

And perhaps you do balance it nicely by going to the opposite extreme
with SwapPss 0 for all (again for the eminently understandable reason,
that it would be a whole lot more new code to work out the right number).
Altogther, you're saying everyone's using more swap than they probably
are, but that's okay because it's infinitely shared.

I am not at all angling for you or anyone to make the changes necessary
to make those numbers accurate.  I think there's a point at which we
stop cluttering up the core kernel code, just for the sake of
maintaining numbers for a /proc file someone thought was a good idea
at the time.  But I am hoping that if this patch goes in, you will take
responsibility for batting away all the complaints that it doesn't work
as this or that person expected, rather than a long stream of patches
to refine it.

I think the root problem is that we're trying to use /proc//smaps
for something that's independent of  and its maps: a shmem object.
Would we be better served by a tmpfs-ish filesystem mounted somewhere,
which gives names to all the objects on the internal mount of tmpfs
(SysV SHM, memfds etc); and some fincore-ish syscalls which could be
used to interrogate how much swap any tmpfs file is using in any range?
(I am not volunteering to write this, not in the foreseeable future.)

I have no idea of the security implications of naming the hidden, it
may be a non-starter.  And my guess is, it would be nice if it already
existed, but you need a solution today to some problems that have been
wasting your time; and grafting it into smaps looks to be good enough.

Some comments on your implementation below.

> ---
>  Documentation/filesystems/proc.txt |  6 ++--
>  fs/proc/task_mmu.c | 48 ++
>  include/linux/shmem_fs.h   |  6 
>  mm/shmem.c | 61 
> ++
>  4 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index 7ef50cb..82d3657 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -457,8 +457,10 @@ accessed.
>  a mapping associated with a file may contain anonymous pages: when 
> MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous 
> copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
> -swap.
> -"SwapPss" shows proportional swap share of this mapping.
> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
> +underlying shmem object is on swap.

And for private mappings of tmpfs files?  I expected it to show an

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-04 Thread Hugh Dickins
On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Konstantin Khlebnikov 

Neither Ack nor Nack from me.

I don't want to stand in the way of this patch, if you and others
believe that it will help to diagnose problems in the field better
than what's shown at present; but to me it looks dangerously like
replacing no information by wrong information.

As you acknowledge in the commit message, if a file of 100 pages
were copied to tmpfs, and 100 tasks map its full extent, but they
all mess around with the first 50 pages and take no interest in
the last 50, then it's quite likely that that last 50 will get
swapped out; then with your patch, 100 tasks are each shown as
using 50 pages of swap, when none of them are actually using any.

It is rather as if we didn't bother to record Rss, and just put
Size in there instead: you are (for understandable reasons) treating
the virtual address space as if every page of it had been touched.

But I accept that there may well be a class of processes and problems
which would be better served by this fiction than the present: I expect
you have much more experience of helping out in such situations than I.

And perhaps you do balance it nicely by going to the opposite extreme
with SwapPss 0 for all (again for the eminently understandable reason,
that it would be a whole lot more new code to work out the right number).
Altogther, you're saying everyone's using more swap than they probably
are, but that's okay because it's infinitely shared.

I am not at all angling for you or anyone to make the changes necessary
to make those numbers accurate.  I think there's a point at which we
stop cluttering up the core kernel code, just for the sake of
maintaining numbers for a /proc file someone thought was a good idea
at the time.  But I am hoping that if this patch goes in, you will take
responsibility for batting away all the complaints that it doesn't work
as this or that person expected, rather than a long stream of patches
to refine it.

I think the root problem is that we're trying to use /proc//smaps
for something that's independent of  and its maps: a shmem object.
Would we be better served by a tmpfs-ish filesystem mounted somewhere,
which gives names to all the objects on the internal mount of tmpfs
(SysV SHM, memfds etc); and some fincore-ish syscalls which could be
used to interrogate how much swap any tmpfs file is using in any range?
(I am not volunteering to write this, not in the foreseeable future.)

I have no idea of the security implications of naming the hidden, it
may be a non-starter.  And my guess is, it would be nice if it already
existed, but you need a solution today to some problems that have been
wasting your time; and grafting it into smaps looks to be good enough.

Some comments on your implementation below.

> ---
>  Documentation/filesystems/proc.txt |  6 ++--
>  fs/proc/task_mmu.c | 48 ++
>  include/linux/shmem_fs.h   |  6 
>  mm/shmem.c | 61 
> ++
>  4 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index 7ef50cb..82d3657 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -457,8 +457,10 @@ accessed.
>  a mapping associated with a file may contain anonymous pages: when 
> MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous 
> copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
> -swap.
> -"SwapPss" shows proportional swap share of this mapping.
> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
> +underlying shmem object is on swap.

And for private mappings of 

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Andrew Morton
On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka  wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> ...
>
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
> address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
> end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP
> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> + pgoff_t start, pgoff_t end);
> +#endif

CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

I'd just do

--- 
a/include/linux/shmem_fs.h~mm-proc-account-for-shmem-swap-in-proc-pid-smaps-fix
+++ a/include/linux/shmem_fs.h
@@ -60,11 +60,9 @@ extern struct page *shmem_read_mapping_p
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
-#ifdef CONFIG_SWAP
 extern unsigned long shmem_swap_usage(struct inode *inode);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end);
-#endif
 
 static inline struct page *shmem_read_mapping_page(
struct address_space *mapping, pgoff_t index)


We don't need the ifdefs around declarations and they're a pain to
maintain and they'd add a *ton* of clutter if we even tried to do this
for real.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Michal Hocko
On Fri 02-10-15 15:35:49, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Konstantin Khlebnikov 

Acked-by: Michal Hocko 

But I think comments explaining why i_mutex is not needed are
confusing and incomplete.
[...]
> + /*
> +  * Here we have to inspect individual pages in our mapped range to
> +  * determine how much of them are swapped out. Thanks to RCU, we don't
> +  * need i_mutex to protect against truncating or hole punching.
> +  */
> + start = linear_page_index(vma, vma->vm_start);
> + end = linear_page_index(vma, vma->vm_end);
> +
> + return shmem_partial_swap_usage(inode->i_mapping, start, end);
[...]
> +/*
> + * Determine (in bytes) how many pages within the given range are swapped 
> out.
> + *
> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
> + */
> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> + pgoff_t start, pgoff_t end)

AFAIU RCU only helps to prevent from accessing nodes which were freed
from the radix tree. The reason why we do not need to hold i_mutex is
that the radix tree iterator would break out of the loop if we entered
node which backed truncated range. At least this is my understanding, I
might be wrong here of course.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Jerome Marchand
On 10/02/2015 03:35 PM, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.

Agreed, this is much better than the current situation. I don't think
there is such a thing as a perfect accounting of shared pages anyway.

> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Konstantin Khlebnikov 

Acked-by: Jerome Marchand 





signature.asc
Description: OpenPGP digital signature


[PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Vlastimil Babka
Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka 
Acked-by: Konstantin Khlebnikov 
---
 Documentation/filesystems/proc.txt |  6 ++--
 fs/proc/task_mmu.c | 48 ++
 include/linux/shmem_fs.h   |  6 
 mm/shmem.c | 61 ++
 4 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index 7ef50cb..82d3657 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -457,8 +457,10 @@ accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
-swap.
-"SwapPss" shows proportional swap share of this mapping.
+swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
+underlying shmem object is on swap.
+"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
+currently show 0 here.
 "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
 "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
 hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 04999b2..103457c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
 }
 #endif /* HUGETLB_PAGE */
 
+#ifdef CONFIG_SHMEM
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+   struct inode *inode;
+   unsigned long swapped;
+   pgoff_t start, end;
+
+   if (!vma->vm_file)
+   return 0;
+
+   inode = file_inode(vma->vm_file);
+
+   if (!shmem_mapping(inode->i_mapping))
+   return 0;
+
+   /*
+* The easier cases are when the shmem object has nothing in swap, or
+* we have the whole object mapped. Then we can simply use the stats
+* that are already tracked by shmem.
+*/
+   swapped = shmem_swap_usage(inode);
+
+   if (swapped == 0)
+   return 0;
+
+   if (vma->vm_end - vma->vm_start >= inode->i_size)
+   return swapped;
+
+   /*
+* Here we have to inspect individual pages in our mapped range to
+* determine how much of them are swapped out. Thanks to RCU, we don't
+* need i_mutex to protect against truncating or hole punching.
+*/
+   start = linear_page_index(vma, vma->vm_start);
+   end = linear_page_index(vma, vma->vm_end);
+
+   return shmem_partial_swap_usage(inode->i_mapping, start, end);
+}
+#else
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+   return 0;
+}
+#endif
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
struct vm_area_struct *vma = v;
@@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
/* mmap_sem is held in m_start */
walk_page_vma(vma, _walk);
 
+   mss.swap += smaps_shmem_swap(vma);
+
show_map_vma(m, vma, is_pid);
 
seq_printf(m,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 50777b5..12519e4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
address_space *mapping,
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+  

[PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Vlastimil Babka
Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka 
Acked-by: Konstantin Khlebnikov 
---
 Documentation/filesystems/proc.txt |  6 ++--
 fs/proc/task_mmu.c | 48 ++
 include/linux/shmem_fs.h   |  6 
 mm/shmem.c | 61 ++
 4 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index 7ef50cb..82d3657 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -457,8 +457,10 @@ accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
-swap.
-"SwapPss" shows proportional swap share of this mapping.
+swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
+underlying shmem object is on swap.
+"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
+currently show 0 here.
 "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
 "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
 hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 04999b2..103457c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
 }
 #endif /* HUGETLB_PAGE */
 
+#ifdef CONFIG_SHMEM
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+   struct inode *inode;
+   unsigned long swapped;
+   pgoff_t start, end;
+
+   if (!vma->vm_file)
+   return 0;
+
+   inode = file_inode(vma->vm_file);
+
+   if (!shmem_mapping(inode->i_mapping))
+   return 0;
+
+   /*
+* The easier cases are when the shmem object has nothing in swap, or
+* we have the whole object mapped. Then we can simply use the stats
+* that are already tracked by shmem.
+*/
+   swapped = shmem_swap_usage(inode);
+
+   if (swapped == 0)
+   return 0;
+
+   if (vma->vm_end - vma->vm_start >= inode->i_size)
+   return swapped;
+
+   /*
+* Here we have to inspect individual pages in our mapped range to
+* determine how much of them are swapped out. Thanks to RCU, we don't
+* need i_mutex to protect against truncating or hole punching.
+*/
+   start = linear_page_index(vma, vma->vm_start);
+   end = linear_page_index(vma, vma->vm_end);
+
+   return shmem_partial_swap_usage(inode->i_mapping, start, end);
+}
+#else
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+   return 0;
+}
+#endif
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
struct vm_area_struct *vma = v;
@@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
/* mmap_sem is held in m_start */
walk_page_vma(vma, _walk);
 
+   mss.swap += smaps_shmem_swap(vma);
+
show_map_vma(m, vma, is_pid);
 
seq_printf(m,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 50777b5..12519e4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
address_space *mapping,
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct 

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Jerome Marchand
On 10/02/2015 03:35 PM, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.

Agreed, this is much better than the current situation. I don't think
there is such a thing as a perfect accounting of shared pages anyway.

> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Konstantin Khlebnikov 

Acked-by: Jerome Marchand 





signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Michal Hocko
On Fri 02-10-15 15:35:49, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka 
> Acked-by: Konstantin Khlebnikov 

Acked-by: Michal Hocko 

But I think comments explaining why i_mutex is not needed are
confusing and incomplete.
[...]
> + /*
> +  * Here we have to inspect individual pages in our mapped range to
> +  * determine how much of them are swapped out. Thanks to RCU, we don't
> +  * need i_mutex to protect against truncating or hole punching.
> +  */
> + start = linear_page_index(vma, vma->vm_start);
> + end = linear_page_index(vma, vma->vm_end);
> +
> + return shmem_partial_swap_usage(inode->i_mapping, start, end);
[...]
> +/*
> + * Determine (in bytes) how many pages within the given range are swapped 
> out.
> + *
> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
> + */
> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> + pgoff_t start, pgoff_t end)

AFAIU RCU only helps to prevent from accessing nodes which were freed
from the radix tree. The reason why we do not need to hold i_mutex is
that the radix tree iterator would break out of the loop if we entered
node which backed truncated range. At least this is my understanding, I
might be wrong here of course.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Andrew Morton
On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka  wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> ...
>
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
> address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
> end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP
> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> + pgoff_t start, pgoff_t end);
> +#endif

CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

I'd just do

--- 
a/include/linux/shmem_fs.h~mm-proc-account-for-shmem-swap-in-proc-pid-smaps-fix
+++ a/include/linux/shmem_fs.h
@@ -60,11 +60,9 @@ extern struct page *shmem_read_mapping_p
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
-#ifdef CONFIG_SWAP
 extern unsigned long shmem_swap_usage(struct inode *inode);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end);
-#endif
 
 static inline struct page *shmem_read_mapping_page(
struct address_space *mapping, pgoff_t index)


We don't need the ifdefs around declarations and they're a pain to
maintain and they'd add a *ton* of clutter if we even tried to do this
for real.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/