Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-16 Thread Kirill A. Shutemov
On Thu, Jan 15, 2015 at 05:15:43PM -0800, Andrew Morton wrote:
 On Thu, 15 Jan 2015 00:51:50 +0100 Rasmus Villemoes 
 li...@rasmusvillemoes.dk wrote:
 
   There are still several flags unused in vma.vm_flags btw.
  
   I'm not sure that we can repurpose vm_pgoff (or vm_private_data) for
   this: a badly behaved thread could make its sp point at a random vma
   then trick the kernel into scribbling on that vma's vm_proff?
  
  Well, we could still check vm_file for being NULL before writing to
  vm_pgoff/vm_stack_tid. 
 
 Yes, I guess that would work.  We'd need to check that nobody else
 is already playing similar games with vm_pgoff.

Well, we do use -vm_pgoff in anonymous VMAs. For rmap in particular --
vma_address().

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-15 Thread Andrew Morton
On Thu, 15 Jan 2015 00:51:50 +0100 Rasmus Villemoes li...@rasmusvillemoes.dk 
wrote:

  There are still several flags unused in vma.vm_flags btw.
 
  I'm not sure that we can repurpose vm_pgoff (or vm_private_data) for
  this: a badly behaved thread could make its sp point at a random vma
  then trick the kernel into scribbling on that vma's vm_proff?
 
 Well, we could still check vm_file for being NULL before writing to
 vm_pgoff/vm_stack_tid. 

Yes, I guess that would work.  We'd need to check that nobody else
is already playing similar games with vm_pgoff.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Calvin Owens
On Wednesday 01/14 at 15:53 +0100, Rasmus Villemoes wrote:
 On Wed, Jan 14 2015, Siddhesh Poyarekar siddhesh.poyare...@gmail.com wrote:
 
  On 14 January 2015 at 19:43, Rasmus Villemoes li...@rasmusvillemoes.dk 
  wrote:
  Just thinking out loud: Could one simply mark a VMA as being used for
  stack during the clone call (is there room in vm_flags, or does
  VM_GROWSDOWN already tell the whole story?), and then write the TID into
  a new field in the VMA - I think one could make a union with vm_pgoff so
  as not to enlarge the structure.
 
  vm_flags does not have space IIRC (that was my first approach at
  implementing this) and VM_GROWSDOWN is not sufficient.
 
 Looking at include/linux/mm.h:
 
 #define VM_GROWSDOWN0x0100  /* general info on the segment */
 #define VM_PFNMAP   0x0400  /* Page-ranges managed without 
 struct page, just pure PFN */
 #define VM_DENYWRITE0x0800  /* ETXTBSY on write attempts.. */
 
 It would seem that 0x0200 is available (unless defined and used
 somewhere else).
 
  If we can make a union with vm_pgoff like you say, we probably don't
  need a flag value; a non-zero value could indicate that it is a thread
  stack.
 
 Well, only when combined with checking vm_file for being NULL. One would
 also need to ensure that vm_pgoff is 0 for any non-stack,
 non-file-backed VMA. At which point it is somewhat ugly. 
 
  One problem with caching the value on clone like this though is that
  the stack could change due to a setcontext, but AFAICT we don't care
  about that for the process stack either.
 
 If it is important, I guess one could update the info when a task calls
 setcontext.

If I understand the current behavior, the [stack] marker will get put
next to *any* mapping that encompasses the current value in the task's
%sp, regardless of how the mapping was created or ucontext stuff. If
you use flags on the VMA structs things could potentially be marked as
stacks even though %sp points somewhere else.

It's probable that nobody cares (you'd obviously have to be doing crazy
things to be pointing %sp at arbitrary places), but that's why I was
hesitant to mess with it.

Thanks,
Calvin
 
 Rasmus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Cyrill Gorcunov
On Wed, Jan 14, 2015 at 12:46:53PM -0800, Calvin Owens wrote:
   
   Restriction to CAP_SYSADMIN for follow_link is undertansble, but why do we
   restrict readdir and readlink?
  
  We didn't think this functionality might be needed someone but us (criu 
  camp),
  so that the rule of thumb was CONFIG_CHECKPOINT_RESTORE + CAP_SYSADMIN, 
  until
  otherwise strictly needed. So I think now we can relax security rules a bit
  and allow to readdir and such for owners.
 
 Ah, I feel silly for missing that. I'll send a patch to move map_files
 out from behind CONFIG_CHECKPOINT_RESTORE and change the permissions.

Sure
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Calvin Owens
On Wednesday 01/14 at 18:33 +0300, Cyrill Gorcunov wrote:
 On Wed, Jan 14, 2015 at 05:25:01PM +0200, Kirill A. Shutemov wrote:
 ...
   
   This gives lsof and suchlike a way to determine the pathnames of files
   mapped into a process without incurring the O(N^2) behavior of the
   maps file.
  
  We already have /proc/PID/map_files/ directory which lists all mapped
  files. Should we consider relaxing permission checking there and move it
  outside CONFIG_CHECKPOINT_RESTORE instead?
  
  Restriction to CAP_SYSADMIN for follow_link is undertansble, but why do we
  restrict readdir and readlink?
 
 We didn't think this functionality might be needed someone but us (criu camp),
 so that the rule of thumb was CONFIG_CHECKPOINT_RESTORE + CAP_SYSADMIN, until
 otherwise strictly needed. So I think now we can relax security rules a bit
 and allow to readdir and such for owners.

Ah, I feel silly for missing that. I'll send a patch to move map_files
out from behind CONFIG_CHECKPOINT_RESTORE and change the permissions.

Thanks,
Calvin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Andrew Morton
On Wed, 14 Jan 2015 13:03:26 -0800 Calvin Owens calvinow...@fb.com wrote:

  Well, only when combined with checking vm_file for being NULL. One would
  also need to ensure that vm_pgoff is 0 for any non-stack,
  non-file-backed VMA. At which point it is somewhat ugly. 
  
   One problem with caching the value on clone like this though is that
   the stack could change due to a setcontext, but AFAICT we don't care
   about that for the process stack either.
  
  If it is important, I guess one could update the info when a task calls
  setcontext.
 
 If I understand the current behavior, the [stack] marker will get put
 next to *any* mapping that encompasses the current value in the task's
 %sp, regardless of how the mapping was created or ucontext stuff. If
 you use flags on the VMA structs things could potentially be marked as
 stacks even though %sp points somewhere else.
 
 It's probable that nobody cares (you'd obviously have to be doing crazy
 things to be pointing %sp at arbitrary places), but that's why I was
 hesitant to mess with it.

Fixing the N^2 search would of course be much better than adding a new
proc file to sidestep it.

Could we do something like refreshing the new vma.vm_flags:VM_IS_STACK
on each thread at the time when /proc/PID/maps is opened?  So do a walk
of the threads, use each thread's sp to hunt down the thread's stack's
vma, then set VM_IS_STACK and fill in the new vma.stack_tid field?

There are still several flags unused in vma.vm_flags btw.

I'm not sure that we can repurpose vm_pgoff (or vm_private_data) for
this: a badly behaved thread could make its sp point at a random vma
then trick the kernel into scribbling on that vma's vm_proff?  Adding a
new field to the vma wouldn't kill us, I guess.  That would remove the
need for a VM_IS_STACK.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Kirill A. Shutemov
On Tue, Jan 13, 2015 at 04:20:29PM -0800, Calvin Owens wrote:
 Commit b76437579d1344b6 (procfs: mark thread stack correctly in
 proc/pid/maps) introduced logic to mark thread stacks with the
 [stack:%d] marker in /proc/pid/maps.
 
 This causes reading /proc/pid/maps to take O(N^2) time, where N is
 the number of threads sharing an address space, since each line of
 output requires iterating over the VMA list looking for ranges that
 correspond to the stack pointer in any task's register set. When
 dealing with highly-threaded Java applications, reading this file can
 take hours and trigger softlockup dumps.
 
 Eliminating the [stack:%d] marker is not a viable option since it's
 been there for some time, and I don't see a way to do the stack check
 more efficiently that wouldn't end up making the whole thing really
 ugly.

Can we find stack for threads once on seq_operations::start() and avoid
for_each_thread() on seq_operations::show() for each stack vma?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Rasmus Villemoes
On Wed, Jan 14 2015, Andrew Morton a...@linux-foundation.org wrote:

 On Wed, 14 Jan 2015 13:03:26 -0800 Calvin Owens calvinow...@fb.com wrote:
 
 If I understand the current behavior, the [stack] marker will get put
 next to *any* mapping that encompasses the current value in the task's
 %sp, regardless of how the mapping was created or ucontext stuff. If
 you use flags on the VMA structs things could potentially be marked as
 stacks even though %sp points somewhere else.
 
 It's probable that nobody cares (you'd obviously have to be doing crazy
 things to be pointing %sp at arbitrary places), but that's why I was
 hesitant to mess with it.

 Fixing the N^2 search would of course be much better than adding a new
 proc file to sidestep it.

 Could we do something like refreshing the new vma.vm_flags:VM_IS_STACK
 on each thread at the time when /proc/PID/maps is opened?  So do a walk
 of the threads, use each thread's sp to hunt down the thread's stack's
 vma, then set VM_IS_STACK and fill in the new vma.stack_tid field?

So this would be roughly #tasks*log(#vmas) + #vmas. Sounds
good. Especially since all the work will be done by the reader, so
there's no extra bookkeeping to do in sys_clone etc. Concurrent readers
could influence what each other end up seeing, but most of the time the
update will be idempotent, and the information may be stale anyway by
the time the reader has a chance to process it.

 There are still several flags unused in vma.vm_flags btw.

 I'm not sure that we can repurpose vm_pgoff (or vm_private_data) for
 this: a badly behaved thread could make its sp point at a random vma
 then trick the kernel into scribbling on that vma's vm_proff?

Well, we could still check vm_file for being NULL before writing to
vm_pgoff/vm_stack_tid. 

 Adding a new field to the vma wouldn't kill us, I guess.  That would
 remove the need for a VM_IS_STACK.

Either way, it seems that that decision can be changed later.

Rasmus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Rasmus Villemoes
On Wed, Jan 14 2015, Siddhesh Poyarekar siddhesh.poyare...@gmail.com wrote:

 On 14 January 2015 at 19:43, Rasmus Villemoes li...@rasmusvillemoes.dk 
 wrote:
 Just thinking out loud: Could one simply mark a VMA as being used for
 stack during the clone call (is there room in vm_flags, or does
 VM_GROWSDOWN already tell the whole story?), and then write the TID into
 a new field in the VMA - I think one could make a union with vm_pgoff so
 as not to enlarge the structure.

 vm_flags does not have space IIRC (that was my first approach at
 implementing this) and VM_GROWSDOWN is not sufficient.

Looking at include/linux/mm.h:

#define VM_GROWSDOWN0x0100  /* general info on the segment */
#define VM_PFNMAP   0x0400  /* Page-ranges managed without struct 
page, just pure PFN */
#define VM_DENYWRITE0x0800  /* ETXTBSY on write attempts.. */

It would seem that 0x0200 is available (unless defined and used
somewhere else).

 If we can make a union with vm_pgoff like you say, we probably don't
 need a flag value; a non-zero value could indicate that it is a thread
 stack.

Well, only when combined with checking vm_file for being NULL. One would
also need to ensure that vm_pgoff is 0 for any non-stack,
non-file-backed VMA. At which point it is somewhat ugly. 

 One problem with caching the value on clone like this though is that
 the stack could change due to a setcontext, but AFAICT we don't care
 about that for the process stack either.

If it is important, I guess one could update the info when a task calls
setcontext.

Rasmus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Kirill A. Shutemov
On Tue, Jan 13, 2015 at 04:20:29PM -0800, Calvin Owens wrote:
 Commit b76437579d1344b6 (procfs: mark thread stack correctly in
 proc/pid/maps) introduced logic to mark thread stacks with the
 [stack:%d] marker in /proc/pid/maps.
 
 This causes reading /proc/pid/maps to take O(N^2) time, where N is
 the number of threads sharing an address space, since each line of
 output requires iterating over the VMA list looking for ranges that
 correspond to the stack pointer in any task's register set. When
 dealing with highly-threaded Java applications, reading this file can
 take hours and trigger softlockup dumps.
 
 Eliminating the [stack:%d] marker is not a viable option since it's
 been there for some time, and I don't see a way to do the stack check
 more efficiently that wouldn't end up making the whole thing really
 ugly.
 
 The use case I'm specifically concerned with is the lsof command, so
 this patch adds an additional file, mapped_files, that simply
 iterates over the VMAs associated with the task and outputs a
 newline-delimited list of the pathnames of the files associated with
 the VMAs, if any.
 
 This gives lsof and suchlike a way to determine the pathnames of files
 mapped into a process without incurring the O(N^2) behavior of the
 maps file.

We already have /proc/PID/map_files/ directory which lists all mapped
files. Should we consider relaxing permission checking there and move it
outside CONFIG_CHECKPOINT_RESTORE instead?

Restriction to CAP_SYSADMIN for follow_link is undertansble, but why do we
restrict readdir and readlink?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Siddhesh Poyarekar
On 14 January 2015 at 19:43, Rasmus Villemoes li...@rasmusvillemoes.dk wrote:
 Just thinking out loud: Could one simply mark a VMA as being used for
 stack during the clone call (is there room in vm_flags, or does
 VM_GROWSDOWN already tell the whole story?), and then write the TID into
 a new field in the VMA - I think one could make a union with vm_pgoff so
 as not to enlarge the structure.

vm_flags does not have space IIRC (that was my first approach at
implementing this) and VM_GROWSDOWN is not sufficient.  If we can make
a union with vm_pgoff like you say, we probably don't need a flag
value; a non-zero value could indicate that it is a thread stack.

One problem with caching the value on clone like this though is that
the stack could change due to a setcontext, but AFAICT we don't care
about that for the process stack either.

Siddhesh
-- 
http://siddhesh.in
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Cyrill Gorcunov
On Wed, Jan 14, 2015 at 05:25:01PM +0200, Kirill A. Shutemov wrote:
...
  
  This gives lsof and suchlike a way to determine the pathnames of files
  mapped into a process without incurring the O(N^2) behavior of the
  maps file.
 
 We already have /proc/PID/map_files/ directory which lists all mapped
 files. Should we consider relaxing permission checking there and move it
 outside CONFIG_CHECKPOINT_RESTORE instead?
 
 Restriction to CAP_SYSADMIN for follow_link is undertansble, but why do we
 restrict readdir and readlink?

We didn't think this functionality might be needed someone but us (criu camp),
so that the rule of thumb was CONFIG_CHECKPOINT_RESTORE + CAP_SYSADMIN, until
otherwise strictly needed. So I think now we can relax security rules a bit
and allow to readdir and such for owners.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-14 Thread Rasmus Villemoes
On Wed, Jan 14 2015, Calvin Owens calvinow...@fb.com wrote:

 Commit b76437579d1344b6 (procfs: mark thread stack correctly in
 proc/pid/maps) introduced logic to mark thread stacks with the
 [stack:%d] marker in /proc/pid/maps.

 This causes reading /proc/pid/maps to take O(N^2) time, where N is
 the number of threads sharing an address space, since each line of
 output requires iterating over the VMA list looking for ranges that
 correspond to the stack pointer in any task's register set. When
 dealing with highly-threaded Java applications, reading this file can
 take hours and trigger softlockup dumps.

 Eliminating the [stack:%d] marker is not a viable option since it's
 been there for some time, and I don't see a way to do the stack check
 more efficiently that wouldn't end up making the whole thing really
 ugly.

Just thinking out loud: Could one simply mark a VMA as being used for
stack during the clone call (is there room in vm_flags, or does
VM_GROWSDOWN already tell the whole story?), and then write the TID into
a new field in the VMA - I think one could make a union with vm_pgoff so
as not to enlarge the structure.

This would allow eliminating the loop over tasks in vm_is_stack.

Rasmus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-13 Thread Calvin Owens
Here's a simple program to trigger the issue with /proc/pid/maps.

Thanks,
Calvin

/* Simple program to reproduce O(N^2) behavior reading /proc/pid/maps
 *
 * Example on a random server:
 *
 *  $ ./map_repro 0
 *  Spawning 0 threads
 *  Reading /proc/self/maps... read 2189 bytes in 1 syscalls in 33us!
 *  $ ./map_repro 10
 *  Spawning 10 threads
 *  Reading /proc/self/maps... read 3539 bytes in 1 syscalls in 55us!
 *  $ ./map_repro 100
 *  Spawning 100 threads
 *  Reading /proc/self/maps... read 15689 bytes in 4 syscalls in 373us!
 *  $ ./map_repro 1000
 *  Spawning 1000 threads
 *  Reading /proc/self/maps... read 137189 bytes in 34 syscalls in 32376us!
 *  $ ./map_repro 2000
 *  Spawning 2000 threads
 *  Reading /proc/self/maps... read 272189 bytes in 68 syscalls in 119980us!
 *  $ ./map_repro 4000
 *  Spawning 4000 threads
 *  Reading /proc/self/maps... read 544912 bytes in 134 syscalls in 
712200us!
 *  $ ./map_repro 8000
 *  Spawning 8000 threads
 *  Reading /proc/self/maps... read 1090189 bytes in 268 syscalls in 
3650718us!
 *  $ ./map_repro 16000
 *  Spawning 16000 threads
 *  Reading /proc/self/maps... read 2178189 bytes in 534 syscalls in 
42701311us!
 */

#include stdlib.h
#include stdio.h
#include errno.h
#include string.h
#include limits.h
#include pthread.h
#include unistd.h
#include time.h
#include fcntl.h

static char buf[65536] = {0};
static void time_maps_read(void)
{
struct timespec then, now;
long usec_elapsed;
int ret, fd;
int count = 0;
int rd = 0;

fd = open(/proc/self/maps, O_RDONLY);
if (fd == -1) {
printf(Couldn't open /proc/self/maps, bailing...\n);
return;
}

printf(Reading /proc/self/maps... );
ret = clock_gettime(CLOCK_MONOTONIC, then);

while (1) {
ret = read(fd, buf, 65536);
if (!ret || ret == -1)
break;
rd += ret;
count++;
}

ret = clock_gettime(CLOCK_MONOTONIC, now); 
usec_elapsed = (now.tv_sec - then.tv_sec) * 100L;
usec_elapsed += (now.tv_nsec - then.tv_nsec) / 1000L;

printf(read %d bytes in %d syscalls in %ldus!\n, rd, count, 
usec_elapsed);
close(fd);
}

static void *do_nothing_forever(void *unused)
{
while (1)
sleep(60);

return NULL;
}

int main(int args, char **argv)
{
int i, ret, threads_to_spawn = 0;   
pthread_t tmp;

if (args != 1) {
threads_to_spawn = atoi(argv[1]);
printf(Spawning %d threads\n, threads_to_spawn);
}

for (i = 0; i  threads_to_spawn; i++) {
ret = pthread_create(tmp, NULL, do_nothing_forever, NULL);
if (ret)
printf(Thread %d failed to spawn?\n, i);
}

time_maps_read();
return 0;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] procfs: Add /proc/pid/mapped_files

2015-01-13 Thread Calvin Owens
Commit b76437579d1344b6 (procfs: mark thread stack correctly in
proc/pid/maps) introduced logic to mark thread stacks with the
[stack:%d] marker in /proc/pid/maps.

This causes reading /proc/pid/maps to take O(N^2) time, where N is
the number of threads sharing an address space, since each line of
output requires iterating over the VMA list looking for ranges that
correspond to the stack pointer in any task's register set. When
dealing with highly-threaded Java applications, reading this file can
take hours and trigger softlockup dumps.

Eliminating the [stack:%d] marker is not a viable option since it's
been there for some time, and I don't see a way to do the stack check
more efficiently that wouldn't end up making the whole thing really
ugly.

The use case I'm specifically concerned with is the lsof command, so
this patch adds an additional file, mapped_files, that simply
iterates over the VMAs associated with the task and outputs a
newline-delimited list of the pathnames of the files associated with
the VMAs, if any.

This gives lsof and suchlike a way to determine the pathnames of files
mapped into a process without incurring the O(N^2) behavior of the
maps file.

Signed-off-by: Calvin Owens calvinow...@fb.com
---
I'm also sending a simple repro program as a reply to this E-Mail.

 fs/proc/base.c |  1 +
 fs/proc/internal.h |  1 +
 fs/proc/task_mmu.c | 32 
 3 files changed, 34 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3f3d7ae..15f8bd0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2564,6 +2564,7 @@ static const struct pid_entry tgid_base_stuff[] = {
ONE(stat,   S_IRUGO, proc_tgid_stat),
ONE(statm,  S_IRUGO, proc_pid_statm),
REG(maps,   S_IRUGO, proc_pid_maps_operations),
+   REG(mapped_files, S_IRUGO, proc_mapped_files_operations),
 #ifdef CONFIG_NUMA
REG(numa_maps,  S_IRUGO, proc_pid_numa_maps_operations),
 #endif
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 6fcdba5..a09bbdd 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -284,6 +284,7 @@ struct mm_struct *proc_mem_open(struct inode *inode, 
unsigned int mode);
 
 extern const struct file_operations proc_pid_maps_operations;
 extern const struct file_operations proc_tid_maps_operations;
+extern const struct file_operations proc_mapped_files_operations;
 extern const struct file_operations proc_pid_numa_maps_operations;
 extern const struct file_operations proc_tid_numa_maps_operations;
 extern const struct file_operations proc_pid_smaps_operations;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 246eae8..bc101e0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -412,6 +412,38 @@ const struct file_operations proc_tid_maps_operations = {
.release= proc_map_release,
 };
 
+static int show_next_mapped_file(struct seq_file *m, void *v)
+{
+   struct vm_area_struct *vma = v;
+   struct file *file = vma-vm_file;
+
+   if (file) {
+   seq_path(m, file-f_path, \n);
+   seq_putc(m, '\n');
+   }
+
+   return 0;
+}
+
+static const struct seq_operations mapped_files_seq_op = {
+   .start  = m_start,
+   .next   = m_next,
+   .stop   = m_stop,
+   .show   = show_next_mapped_file,
+};
+
+static int mapped_files_open(struct inode *inode, struct file *file)
+{
+   return do_maps_open(inode, file, mapped_files_seq_op);
+}
+
+const struct file_operations proc_mapped_files_operations = {
+   .open   = mapped_files_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release_private,
+};
+
 /*
  * Proportional Set Size(PSS): my share of RSS.
  *
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/