[PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-19 Thread Fengguang Wu
Show a process's page-by-page address space infomation in /proc//pmaps.
It helps to analyze application memory footprints in a comprehensive way.

Pages share the same states are grouped into a page range.
For each page range, the following fields are exported:
- [HEX NUM] first page index
- [HEX NUM] number of pages in the range
- [STRING]  well known page/pte flags
- [DEC NUM] number of mmap users

Only page flags not expected to disappear in the near future are exported:

Y:pteyoung R:referenced A:active U:uptodate P:ptedirty D:dirty W:writeback

Here is a sample output:

wfg ~% cat /proc/$$/pmaps
0040-00492000 r-xp  08:01 1727526
/bin/zsh4
0   1   YRAU___ 7
2   16  YRAU___ 7
19  5   YRAU___ 7
20  18  YRAU___ 7
38  1   YRAU___ 6
39  9   YRAU___ 7
43  46  YRAU___ 7
00691000-00697000 rw-p 00091000 08:01 1727526
/bin/zsh4
91  6   Y_A_P__ 1
00697000-00b6e000 rw-p 00697000 00:00 0  [heap]
698 1   Y_A_P__ 1
69c 1   Y_A_P__ 1
69e 2   Y_A_P__ 1
6a6 1   Y_A_P__ 1
6a8 2   Y_A_P__ 1
6ad 4bc Y_A_P__ 1
b6a 2   Y_A_P__ 1
2b661b3ea000-2b661b407000 r-xp  08:01 1563879
/lib/ld-2.6.so
0   1   YRAU___ 78
1   4   YRAU___ 56
5   2   YRAU___ 70
7   1   YRAU___ 65
8   1   YRAU___ 70
9   1   YRAU___ 96
a   1   YRAU___ 72
b   2   YRAU___ 70
d   2   YRAU___ 96
f   1   YRAU___ 70
10  1   YRAU___ 58
11  1   YRAU___ 52
12  1   YRAU___ 19
13  1   YRAU___ 96
14  1   YRAU___ 57
15  1   YRAU___ 96
16  1   YRAU___ 71
17  1   YRAU___ 96
18  1   YRAU___ 52
1a  1   YRAU___ 70
2b661b407000-2b661b40a000 rw-p 2b661b407000 00:00 0
2b661b407   3   Y_A_P__ 1
2b661b606000-2b661b608000 rw-p 0001c000 08:01 1563879
/lib/ld-2.6.so
1c  2   Y_A_P__ 1
[...]

Matt Mackall's pagemap/kpagemap and John Berthels's exmap can achieve the same 
goals,
and probably more. But this text based pmaps interface should be more easy to 
use.

The concern of data set size is taken care of by working in a sparse way.

1) It will only generate output for resident pages, that normally is much
   smaller than the mapped size. For example, the VSZ:RSS ratio of ALL
   processes are 4516796KB:457048KB ~= 10:1.
   

2) The page range trick suppresses more output. For example, my
   running firefox has a (RSS_pages:page_ranges) of 16k:2k ~= 8:1.

It's interesting to see that the seq_file interface demands some
more programming efforts, and provides such flexibility as well.

In the worst case of each resident page makes a line in pmaps, a 4GB RSS can
produce up to 1M pmaps lines, or about 20MB data.

Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: David Rientjes <[EMAIL PROTECTED]>
Cc: Matt Mackall <[EMAIL PROTECTED]>
Cc: John Berthels <[EMAIL PROTECTED]>
Cc: Nick Piggin <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/proc/base.c |7 +
 fs/proc/internal.h |1 
 fs/proc/task_mmu.c |  180 +++
 3 files changed, 188 insertions(+)

--- linux-2.6.23-rc2-mm2.orig/fs/proc/task_mmu.c
+++ linux-2.6.23-rc2-mm2/fs/proc/task_mmu.c
@@ -752,3 +752,183 @@ const struct file_operations proc_numa_m
.release= seq_release_private,
 };
 #endif /* CONFIG_NUMA */
+
+struct pmaps_private {
+   struct proc_maps_private pmp;
+   struct vm_area_struct *vma;
+   struct seq_file *m;
+   /* page range attrs */
+   unsigned long offset;
+   unsigned long len;
+   unsigned long flags;
+   int mapcount;
+};
+
+#define PMAPS_BUF_SIZE   (64<<10)  /* 64K */
+#define PMAPS_BATCH_SIZE (16<<20)  /* 16M */
+
+#define PG_YOUNG   PG_readahead/* reuse any non-relevant flag */
+#define PG_DIRTY   PG_lru  /* ditto */
+
+static unsigned long page_mask;
+
+static struct {
+   unsigned long   mask;
+   const char *name;
+   boolfaked;
+} page_flags [] = {
+   {1 << PG_YOUNG, "Y:pteyoung",   1},
+   {1 << PG_referenced,"R:referenced", 0},
+   {1 << PG_active,"A:active", 0},
+
+   {1 << PG_uptodate,  "U:uptodate",   0},
+   {1 << PG_DIRTY, "P:ptedirty",   1},
+   {1 << PG_dirty, "D:dirty",  0},
+   {1 << PG_writeback, "W:writeback",  0},
+};
+
+static unsigned long pte_page_flags(pte_t ptent, struct page* page)
+{
+   unsigned long flags;
+
+   flags = page->flags & page_mask;
+
+   if (pte_young(ptent))
+   flags |= (1 << PG_YOUNG);
+
+   if (pte_dirty(ptent))
+   flags |= (1 << PG_DIRTY);
+
+   return flags;
+}
+

Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-18 Thread Fengguang Wu
On Sat, Aug 18, 2007 at 12:22:26PM -0500, Matt Mackall wrote:
> > > > So VSZ:RSS ratio actually goes up with memory pressure.
> > > 
> > > And yes.
> > > 
> > > But that's not what I'm talking about. You're likely to have more
> > > holes in your ranges with memory pressure as things that aren't active
> > > get paged or swapped out and back in. And because we're walking the
> > > LRU more rapidly, we'll flip over a lot of the active bits more often
> > > which will mean more output.
> > > 
> > > >   - page range is a good unit of locality. They are more likely to be
> > > > reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.
> > > 
> > > There is that. The relative magnitude of the different effects is
> > > unclear. But it is clear that the worst case for pmap is much worse
> > 
> > > than pagemap (two lines per page of RSS?). 
> > It's one line per page. No sane app will make vmas proliferate.
> 
> Sane apps are few and far between.

Very likely, and they will bloat maps/smaps/pmaps alike :(

> > So let's talk about the worst case.
> > 
> > pagemap's data set size is determined by VSZ.
> > 4GB VSZ means 1M PFNs, hence 8MB pagemap data.
> > 
> > pmaps's data set size is bounded by RSS hence physical memory.
> > 4GB RSS means up to 1M page ranges, hence ~20M pmaps data.
> > Not too bad :)
> 
> Hmmm, I've been misreading the output.
> 
> What does it do with nonlinear VMAs?

The implementation gets offset from page_index(page), so will work
the same way in linear/nonlinear VMAs. Depending how one does the
remap_file_ranges() calls, the output lines may be not strictly
ordered by offset, or overlap, or have small page ranges. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-18 Thread Matt Mackall
On Sat, Aug 18, 2007 at 04:45:31PM +0800, Fengguang Wu wrote:
> Matt,
> 
> On Sat, Aug 18, 2007 at 01:40:42AM -0500, Matt Mackall wrote:
> > > - On memory pressure,
> > >   - as VSZ goes up, RSS will be bounded by physical memory.
> > > So VSZ:RSS ratio actually goes up with memory pressure.
> > 
> > And yes.
> > 
> > But that's not what I'm talking about. You're likely to have more
> > holes in your ranges with memory pressure as things that aren't active
> > get paged or swapped out and back in. And because we're walking the
> > LRU more rapidly, we'll flip over a lot of the active bits more often
> > which will mean more output.
> > 
> > >   - page range is a good unit of locality. They are more likely to be
> > > reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.
> > 
> > There is that. The relative magnitude of the different effects is
> > unclear. But it is clear that the worst case for pmap is much worse
> 
> > than pagemap (two lines per page of RSS?). 
> It's one line per page. No sane app will make vmas proliferate.

Sane apps are few and far between.
 
> So let's talk about the worst case.
> 
> pagemap's data set size is determined by VSZ.
> 4GB VSZ means 1M PFNs, hence 8MB pagemap data.
> 
> pmaps's data set size is bounded by RSS hence physical memory.
> 4GB RSS means up to 1M page ranges, hence ~20M pmaps data.
> Not too bad :)

Hmmm, I've been misreading the output.

What does it do with nonlinear VMAs?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-18 Thread Fengguang Wu
On Sat, Aug 18, 2007 at 01:40:42AM -0500, Matt Mackall wrote:
> > > - you don't get page frame numbers
> > 
> > True. I guess PFNs are meaningless to a normal user?
> 
> They're useful for anyone who's trying to look at the system as a
> whole.

To answer the question: "who are sharing this page with me"?
PFNs are not the only option. The tuple dev/ino/offset can also
uniquely identify the shared page :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-18 Thread Fengguang Wu
Matt,

On Sat, Aug 18, 2007 at 01:40:42AM -0500, Matt Mackall wrote:
> > - On memory pressure,
> >   - as VSZ goes up, RSS will be bounded by physical memory.
> > So VSZ:RSS ratio actually goes up with memory pressure.
> 
> And yes.
> 
> But that's not what I'm talking about. You're likely to have more
> holes in your ranges with memory pressure as things that aren't active
> get paged or swapped out and back in. And because we're walking the
> LRU more rapidly, we'll flip over a lot of the active bits more often
> which will mean more output.
> 
> >   - page range is a good unit of locality. They are more likely to be
> > reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.
> 
> There is that. The relative magnitude of the different effects is
> unclear. But it is clear that the worst case for pmap is much worse

> than pagemap (two lines per page of RSS?). 
It's one line per page. No sane app will make vmas proliferate.

So let's talk about the worst case.

pagemap's data set size is determined by VSZ.
4GB VSZ means 1M PFNs, hence 8MB pagemap data.

pmaps's data set size is bounded by RSS hence physical memory.
4GB RSS means up to 1M page ranges, hence ~20M pmaps data.
Not too bad :)

> > > But there are still the downsides I have mentioned:
> > > 
> > > - you don't get page frame numbers
> > 
> > True. I guess PFNs are meaningless to a normal user?
> 
> They're useful for anyone who's trying to look at the system as a
> whole.
>  
> > > - you can't do random access
> > 
> > Not for now.
> > 
> > It would be trivial to support seek-by-address semantic: the seqfile
> > operations already iterate by addresses. Only that we cannot do it via
> > the regular read/pread/seek interfaces. They have different semantic
> > on fpos. However, tricks like ioctl(begin_addr, end_addr) can be
> > employed if necessary.
> 
> I suppose. But if you're willing to stomach that sort of thing, you
> might as well use a simple binary interface.

Python can do ioctl() :)

Anyway it's already a special interface.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-18 Thread Matt Mackall
On Sat, Aug 18, 2007 at 10:48:31AM +0800, Fengguang Wu wrote:
> Matt,
> 
> On Fri, Aug 17, 2007 at 11:58:08AM -0500, Matt Mackall wrote:
> > On Fri, Aug 17, 2007 at 02:47:27PM +0800, Fengguang Wu wrote:
> > > It's not easy to do direct performance comparisons between pmaps and
> > > pagemap/kpagemap. However some close analyzes are still possible :)
> > > 
> > > 1) code size
> > > pmaps   ~200 LOC
> > > pagemap/kpagemap~300 LOC
> > > 
> > > 2) dataset size
> > > take for example my running firefox on Intel Core 2:
> > > VSZ 400 MB
> > > RSS  64 MB, or 16k pages
> > > pmaps64 KB, wc shows 2k lines, or so much page ranges
> > > pagemap 800 KB, could be heavily optimized by returning partial 
> > > data
> > 
> > I take it you're in 64-bit mode?
> 
> Yes. That will be the common case.
> 
> > You're right, this data compresses well in many circumstances. I
> > suspect it will suffer under memory pressure though. That will
> > fragment the ranges in-memory and also fragment the active bits. The
> > worst case here is huge, of course, but realistically I'd expect
> > something like 2x-4x.
> 
> Not likely to degrade even under memory pressure ;)
> 
> The compress_ratio = (VSZ:RSS) * (RSS:page_ranges).
> - On fresh startup and no memory pressure,
>   - the VSZ:RSS ratio of ALL processes are 4516796KB:457048KB ~= 10:1.
>   - the firefox case shows a (RSS:page_ranges) of 16k:2k ~= 8:1.

Yes.

> - On memory pressure,
>   - as VSZ goes up, RSS will be bounded by physical memory.
> So VSZ:RSS ratio actually goes up with memory pressure.

And yes.

But that's not what I'm talking about. You're likely to have more
holes in your ranges with memory pressure as things that aren't active
get paged or swapped out and back in. And because we're walking the
LRU more rapidly, we'll flip over a lot of the active bits more often
which will mean more output.

>   - page range is a good unit of locality. They are more likely to be
> reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.

There is that. The relative magnitude of the different effects is
unclear. But it is clear that the worst case for pmap is much worse
than pagemap (two lines per page of RSS?). 

> > But there are still the downsides I have mentioned:
> > 
> > - you don't get page frame numbers
> 
> True. I guess PFNs are meaningless to a normal user?

They're useful for anyone who's trying to look at the system as a
whole.
 
> > - you can't do random access
> 
> Not for now.
> 
> It would be trivial to support seek-by-address semantic: the seqfile
> operations already iterate by addresses. Only that we cannot do it via
> the regular read/pread/seek interfaces. They have different semantic
> on fpos. However, tricks like ioctl(begin_addr, end_addr) can be
> employed if necessary.

I suppose. But if you're willing to stomach that sort of thing, you
might as well use a simple binary interface.
 
-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-17 Thread Fengguang Wu
Matt,

On Fri, Aug 17, 2007 at 11:58:08AM -0500, Matt Mackall wrote:
> On Fri, Aug 17, 2007 at 02:47:27PM +0800, Fengguang Wu wrote:
> > It's not easy to do direct performance comparisons between pmaps and
> > pagemap/kpagemap. However some close analyzes are still possible :)
> > 
> > 1) code size
> > pmaps   ~200 LOC
> > pagemap/kpagemap~300 LOC
> > 
> > 2) dataset size
> > take for example my running firefox on Intel Core 2:
> > VSZ 400 MB
> > RSS  64 MB, or 16k pages
> > pmaps64 KB, wc shows 2k lines, or so much page ranges
> > pagemap 800 KB, could be heavily optimized by returning partial data
> 
> I take it you're in 64-bit mode?

Yes. That will be the common case.

> You're right, this data compresses well in many circumstances. I
> suspect it will suffer under memory pressure though. That will
> fragment the ranges in-memory and also fragment the active bits. The
> worst case here is huge, of course, but realistically I'd expect
> something like 2x-4x.

Not likely to degrade even under memory pressure ;)

The compress_ratio = (VSZ:RSS) * (RSS:page_ranges).
- On fresh startup and no memory pressure,
  - the VSZ:RSS ratio of ALL processes are 4516796KB:457048KB ~= 10:1.
  - the firefox case shows a (RSS:page_ranges) of 16k:2k ~= 8:1.
- On memory pressure,
  - as VSZ goes up, RSS will be bounded by physical memory.
So VSZ:RSS ratio actually goes up with memory pressure.
  - page range is a good unit of locality. They are more likely to be
reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.

> But there are still the downsides I have mentioned:
> 
> - you don't get page frame numbers

True. I guess PFNs are meaningless to a normal user?

> - you can't do random access

Not for now.

It would be trivial to support seek-by-address semantic: the seqfile
operations already iterate by addresses. Only that we cannot do it via
the regular read/pread/seek interfaces. They have different semantic
on fpos. However, tricks like ioctl(begin_addr, end_addr) can be
employed if necessary.

> And how long does it take to pull the data out? My benchmarks show
> greater than 50MB/s (and that's with the version in -mm that's doing
> double buffering), so that 800K would take < .016s. 

You are right :)

> > kpagemap256 KB
> > 
> > 3) runtime overheads
> > pmaps2k lines of string processing(encode/decode)
> > kpagemap16k seek()/read()s, and context switches (could be
> > optimized somehow by doing a PFN sort first, but
> > that's also non-trivial overheads)
> 
> You can do anywhere between 16k small reads or 1 large read. Depends

No way to avoid the seeks if PFNs are discontinuous. Too bad the
memory get fragmented with uptime, at least for the current kernel.

But sure, sequential reads are viable when doing whole system memory
analysis, or for memory hog processes.

> what data you're trying to get. Right now, kpagemap is fast enough
> that I can do realtime displays of the whole of memory in my desktop
> in a GUI written in Python. And Python is fairly horrible for drawing
> bitmaps and such.
> 
> http://www.selenic.com/Screenshot-kpagemap.png
> 
> > So pmaps seems to be a clear winner :)
> 
> Except that it's only providing a subset of the data.

Yes, and it's a nice graph :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-17 Thread Matt Mackall
On Fri, Aug 17, 2007 at 02:47:27PM +0800, Fengguang Wu wrote:
> Matt,
> 
> It's not easy to do direct performance comparisons between pmaps and
> pagemap/kpagemap. However some close analyzes are still possible :)
> 
> 1) code size
> pmaps   ~200 LOC
> pagemap/kpagemap~300 LOC
> 
> 2) dataset size
> take for example my running firefox on Intel Core 2:
> VSZ 400 MB
> RSS  64 MB, or 16k pages
> pmaps64 KB, wc shows 2k lines, or so much page ranges
> pagemap 800 KB, could be heavily optimized by returning partial data

I take it you're in 64-bit mode?

You're right, this data compresses well in many circumstances. I
suspect it will suffer under memory pressure though. That will
fragment the ranges in-memory and also fragment the active bits. The
worst case here is huge, of course, but realistically I'd expect
something like 2x-4x.

But there are still the downsides I have mentioned:

- you don't get page frame numbers
- you can't do random access

And how long does it take to pull the data out? My benchmarks show
greater than 50MB/s (and that's with the version in -mm that's doing
double buffering), so that 800K would take < .016s. 

> kpagemap256 KB
> 
> 3) runtime overheads
> pmaps2k lines of string processing(encode/decode)
> kpagemap16k seek()/read()s, and context switches (could be
> optimized somehow by doing a PFN sort first, but
> that's also non-trivial overheads)

You can do anywhere between 16k small reads or 1 large read. Depends
what data you're trying to get. Right now, kpagemap is fast enough
that I can do realtime displays of the whole of memory in my desktop
in a GUI written in Python. And Python is fairly horrible for drawing
bitmaps and such.

http://www.selenic.com/Screenshot-kpagemap.png

> So pmaps seems to be a clear winner :)

Except that it's only providing a subset of the data.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-17 Thread Fengguang Wu
Matt,

It's not easy to do direct performance comparisons between pmaps and
pagemap/kpagemap. However some close analyzes are still possible :)

1) code size
pmaps   ~200 LOC
pagemap/kpagemap~300 LOC

2) dataset size
take for example my running firefox on Intel Core 2:
VSZ 400 MB
RSS  64 MB, or 16k pages
pmaps64 KB, wc shows 2k lines, or so much page ranges
pagemap 800 KB, could be heavily optimized by returning partial data
kpagemap256 KB

3) runtime overheads
pmaps2k lines of string processing(encode/decode)
kpagemap16k seek()/read()s, and context switches (could be
optimized somehow by doing a PFN sort first, but
that's also non-trivial overheads)

So pmaps seems to be a clear winner :)

Thank you,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-16 Thread Matt Mackall
On Fri, Aug 17, 2007 at 11:44:37AM +0800, Fengguang Wu wrote:
> > I'm so-so on this. 
> 
> Not that way! It's a good thing that people have different experiences
> and hence viewpoints. Maybe the concept of PFN sharing are
> straightforward to you, while I have been playing with seq_file a lot.
> 
> > On the downside:
> > 
> > - requires lots of parsing
> > - isn't random-access
> > - probably significantly slower than pagemap
> 
> That could be true.  Maybe some user with huge datasets will give us
> some idea about the performance. I don't know, maybe it's application
> dependent.
> 
> Anyway I don't think it's fair to merge a binary interface without the
> challenge from a textual one ;)

Yes, that's why I didn't say I hated it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-16 Thread Fengguang Wu
On Thu, Aug 16, 2007 at 09:38:46PM -0500, Matt Mackall wrote:
> On Fri, Aug 17, 2007 at 06:05:20AM +0800, Fengguang Wu wrote:
> > Show a process's page-by-page address space infomation in /proc//pmaps.
> > It helps to analyze applications' memory footprints in a comprehensive way.
> > 
> > Pages share the same states are grouped into a page range.
> > For each page range, the following fields are exported:
> > - first page index
> > - number of pages in the range
> > - well known page/pte flags
> > - number of mmap users
> > 
> > Only page flags not expected to disappear in the near future are exported:
> > 
> > Y:young R:referenced A:active U:uptodate P:ptedirty D:dirty W:writeback
> ...
> 
> > The concern of dataset size is taken care of by working in a sparse way:
> > 
> > 1) It will only generate output for resident pages, that normally is
> > much smaller than the mapped size. Take my shell for example, the
> > (size:rss) ratio is (7:1)!
> > 
> > wfg ~% cat /proc/$$/smaps |grep Size|sum
> > sum  50552.000
> > avg777.723
> > 
> > wfg ~% cat /proc/$$/smaps |grep Rss|sum
> > sum   7604.000
> > avg116.985
> > 
> > 2) The page range trick suppresses more output.
> > 
> > It's interesting to see that the seq_file interface demands some
> > more programming efforts, and provides such flexibility as well.
> 
> I'm so-so on this. 

Not that way! It's a good thing that people have different experiences
and hence viewpoints. Maybe the concept of PFN sharing are
straightforward to you, while I have been playing with seq_file a lot.

> On the downside:
> 
> - requires lots of parsing
> - isn't random-access
> - probably significantly slower than pagemap

That could be true.  Maybe some user with huge datasets will give us
some idea about the performance. I don't know, maybe it's application
dependent.

Anyway I don't think it's fair to merge a binary interface without the
challenge from a textual one ;)

Thank you,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-16 Thread Matt Mackall
On Fri, Aug 17, 2007 at 06:05:20AM +0800, Fengguang Wu wrote:
> Show a process's page-by-page address space infomation in /proc//pmaps.
> It helps to analyze applications' memory footprints in a comprehensive way.
> 
> Pages share the same states are grouped into a page range.
> For each page range, the following fields are exported:
>   - first page index
>   - number of pages in the range
>   - well known page/pte flags
>   - number of mmap users
> 
> Only page flags not expected to disappear in the near future are exported:
> 
>   Y:young R:referenced A:active U:uptodate P:ptedirty D:dirty W:writeback
...

> The concern of dataset size is taken care of by working in a sparse way:
> 
> 1) It will only generate output for resident pages, that normally is
> much smaller than the mapped size. Take my shell for example, the
> (size:rss) ratio is (7:1)!
> 
> wfg ~% cat /proc/$$/smaps |grep Size|sum
> sum  50552.000
> avg777.723
> 
> wfg ~% cat /proc/$$/smaps |grep Rss|sum
> sum   7604.000
> avg116.985
> 
> 2) The page range trick suppresses more output.
> 
> It's interesting to see that the seq_file interface demands some
> more programming efforts, and provides such flexibility as well.

I'm so-so on this. 

On the downside:

- requires lots of parsing
- isn't random-access
- probably significantly slower than pagemap

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-16 Thread Fengguang Wu
Show a process's page-by-page address space infomation in /proc//pmaps.
It helps to analyze applications' memory footprints in a comprehensive way.

Pages share the same states are grouped into a page range.
For each page range, the following fields are exported:
- first page index
- number of pages in the range
- well known page/pte flags
- number of mmap users

Only page flags not expected to disappear in the near future are exported:

Y:young R:referenced A:active U:uptodate P:ptedirty D:dirty W:writeback

Here is a sample output:

# cat /proc/$$/pmaps
08048000-080c9000 r-xp 08048000 00:00 0
804881  Y_A_P__ 1
080c9000-080f8000 rwxp 080c9000 00:00 0  [heap]
80c92f  Y_A_P__ 1
f7e1c000-f7e25000 r-xp  03:00 176633 
/lib/libnss_files-2.3.6.so
0   1   Y_AU___ 1
1   1   YR_U___ 1
5   1   YR_U___ 1
8   1   YR_U___ 1
f7e25000-f7e27000 rwxp 8000 03:00 176633 
/lib/libnss_files-2.3.6.so
8   2   Y_A_P__ 1
f7e27000-f7e2f000 r-xp  03:00 176635 
/lib/libnss_nis-2.3.6.so
0   1   Y_AU___ 1
1   1   YR_U___ 1
4   1   YR_U___ 1
7   1   Y_AU___ 1
f7e2f000-f7e31000 rwxp 7000 03:00 176635 
/lib/libnss_nis-2.3.6.so
7   2   Y_A_P__ 1
f7e31000-f7e43000 r-xp  03:00 176630 
/lib/libnsl-2.3.6.so
0   1   Y_AU___ 1
1   3   YR_U___ 1
10  1   YR_U___ 1
f7e43000-f7e45000 rwxp 00011000 03:00 176630 
/lib/libnsl-2.3.6.so
11  2   Y_A_P__ 1
f7e45000-f7e47000 rwxp f7e45000 00:00 0
f7e47000-f7e4f000 r-xp  03:00 176631 
/lib/libnss_compat-2.3.6.so
0   1   Y_AU___ 1
1   3   YR_U___ 1
7   1   Y_AU___ 1
f7e4f000-f7e51000 rwxp 7000 03:00 176631 
/lib/libnss_compat-2.3.6.so
7   2   Y_A_P__ 1
f7e51000-f7f79000 r-xp  03:00 176359 
/lib/libc-2.3.6.so
0   16  YRAU___ 2
19  1   YR_U___ 1
1f  1   YRAU___ 2
21  1   YRAU___ 1
22  2   YRAU___ 2
24  1   YRAU___ 1
26  1   YRAU___ 2
[...]


Matt Mackall's pagemap/kpagemap and John Berthels's exmap can achieve the same 
goals,
and probably more. But this text based pmaps interface should be more easy to 
use.

The concern of dataset size is taken care of by working in a sparse way:

1) It will only generate output for resident pages, that normally is
much smaller than the mapped size. Take my shell for example, the
(size:rss) ratio is (7:1)!

wfg ~% cat /proc/$$/smaps |grep Size|sum
sum  50552.000
avg777.723

wfg ~% cat /proc/$$/smaps |grep Rss|sum
sum   7604.000
avg116.985

2) The page range trick suppresses more output.

It's interesting to see that the seq_file interface demands some
more programming efforts, and provides such flexibility as well.

Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: David Rientjes <[EMAIL PROTECTED]>
Cc: Matt Mackall <[EMAIL PROTECTED]>
Cc: John Berthels <[EMAIL PROTECTED]>
Cc: Nick Piggin <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/proc/base.c |7 +
 fs/proc/internal.h |1 
 fs/proc/task_mmu.c |  171 +++
 3 files changed, 179 insertions(+)

--- linux-2.6.23-rc2-mm2.orig/fs/proc/task_mmu.c
+++ linux-2.6.23-rc2-mm2/fs/proc/task_mmu.c
@@ -754,3 +754,174 @@ const struct file_operations proc_numa_m
.release= seq_release_private,
 };
 #endif /* CONFIG_NUMA */
+
+struct pmaps_private {
+   struct proc_maps_private pmp;
+   struct vm_area_struct *vma;
+   struct seq_file *m;
+   /* page range attrs */
+   unsigned long offset;
+   unsigned long len;
+   unsigned long flags;
+   int mapcount;
+};
+
+#define PMAPS_BUF_SIZE   (64<<10)  /* 64K */
+#define PMAPS_BATCH_SIZE (16<<20)  /* 16M */
+
+#define PG_YOUNG   PG_readahead/* reuse any non-relevant flag */
+#define PG_DIRTY   PG_lru  /* ditto */
+
+static unsigned long page_mask;
+
+static struct {
+   unsigned long   mask;
+   const char *name;
+   boolfaked;
+} page_flags [] = {
+   {1 << PG_YOUNG, "Y:pteyoung",   1},
+   {1 << PG_referenced,"R:referenced", 0},
+   {1 << PG_active,"A:active", 0},
+
+   {1 << PG_uptodate,  "U:uptodate",   0},
+   {1 << PG_DIRTY, "P:ptedirty",   1},
+   {1 << PG_dirty, "D:dirty",  0},
+   {1 << PG_writeback, "W:writeback",  0},
+};
+
+static unsigned long pte_page_flags(pte_t ptent, struct page* page)
+{
+   unsigned long flags;
+
+   flags = page->flags & page_mask;
+
+   if (pte_young(ptent))
+   flags |= (1 <<