RE: [PATCH 2/2] procfs: Add 'path' to /proc//fdinfo/

2022-06-01 Thread David Laight
From: Kalesh Singh
> Sent: 31 May 2022 23:30
...
> > File paths can contain fun characters like newlines or colons, which
> > could make parsing out filenames in this text file... fun. How would your
> > userspace parsing logic handle "/home/stephen/filename\nsize:\t4096"? The
> > readlink(2) API makes that easy already.
> 
> I think since we have escaped the "\n" (seq_file_path(m, file, "\n")),
> then user space might parse this line like:
> 
> if (strncmp(line, "path:\t", 6) == 0)
> char* path = line + 6;

The real annoyance is other things doing scans of the filesystem
that accidentally 'bump into' strange names.

While anything serious probably gets it right how many times
Do you run 'find' to quickly search for something?

Spaces in filenames (popularised by some other os) are a PITA.
Not to mention leading and trailing spaces!
Anyone using filenames that only contain spaces does need shooting.

Deliberately adding non-printables isn't really a good idea.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [Linaro-mm-sig] Re: [PATCH 2/2] procfs: Add 'path' to /proc//fdinfo/

2022-06-01 Thread Christian König

Am 01.06.22 um 00:48 schrieb Stephen Brennan:

Kalesh Singh  writes:

On Tue, May 31, 2022 at 3:07 PM Stephen Brennan
 wrote:

On 5/31/22 14:25, Kalesh Singh wrote:

In order to identify the type of memory a process has pinned through
its open fds, add the file path to fdinfo output. This allows
identifying memory types based on common prefixes. e.g. "/memfd...",
"/dmabuf...", "/dev/ashmem...".

Access to /proc//fdinfo is governed by PTRACE_MODE_READ_FSCREDS
the same as /proc//maps which also exposes the file path of
mappings; so the security permissions for accessing path is consistent
with that of /proc//maps.

Hi Kalesh,

Hi Stephen,

Thanks for taking a look.


I think I see the value in the size field, but I'm curious about path,
which is available via readlink /proc//fd/, since those are
symlinks to the file themselves.

This could work if we are root, but the file permissions wouldn't
allow us to do the readlink on other processes otherwise. We want to
be able to capture the system state in production environments from
some trusted process with ptrace read capability.

Interesting, thanks for explaining. It seems weird to have a duplicate
interface for the same information but such is life.


Yeah, the size change is really straight forward but for this one I'm 
not 100% sure either.


Probably best to ping some core fs developer before going further with it.

BTW: Any preferred branch to push this upstream? If not I can take it 
through drm-misc-next.


Regards,
Christian.




File paths can contain fun characters like newlines or colons, which
could make parsing out filenames in this text file... fun. How would your
userspace parsing logic handle "/home/stephen/filename\nsize:\t4096"? The
readlink(2) API makes that easy already.

I think since we have escaped the "\n" (seq_file_path(m, file, "\n")),

I really should have read through that function before commenting,
thanks for teaching me something new :)

Stephen


then user space might parse this line like:

if (strncmp(line, "path:\t", 6) == 0)
 char* path = line + 6;


Thanks,
Kalesh


Is the goal avoiding races (e.g. file descriptor 3 is closed and reopened
to a different path between reading fdinfo and stating the fd)?

Stephen


Signed-off-by: Kalesh Singh 
---

Changes from rfc:
   - Split adding 'size' and 'path' into a separate patches, per Christian
   - Fix indentation (use tabs) in documentaion, per Randy

  Documentation/filesystems/proc.rst | 14 --
  fs/proc/fd.c   |  4 
  2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 779c05528e87..591f12d30d97 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1886,14 +1886,16 @@ if precise results are needed.
  3.8  /proc//fdinfo/ - Information about opened file
  ---
  This file provides information associated with an opened file. The regular
-files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 'size'.
+files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
+and 'path'.

  The 'pos' represents the current offset of the opened file in decimal
  form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
  file has been created with [see open(2) for details] and 'mnt_id' represents
  mount ID of the file system containing the opened file [see 3.5
  /proc//mountinfo for details]. 'ino' represents the inode number of
-the file, and 'size' represents the size of the file in bytes.
+the file, 'size' represents the size of the file in bytes, and 'path'
+represents the file path.

  A typical output is::

@@ -1902,6 +1904,7 @@ A typical output is::
   mnt_id: 19
   ino:63107
   size:   0
+ path:   /dev/null

  All locks associated with a file descriptor are shown in its fdinfo too::

@@ -1920,6 +1923,7 @@ Eventfd files
   mnt_id: 9
   ino:63107
   size:   0
+ path:   anon_inode:[eventfd]
   eventfd-count:  5a

  where 'eventfd-count' is hex value of a counter.
@@ -1934,6 +1938,7 @@ Signalfd files
   mnt_id: 9
   ino:63107
   size:   0
+ path:   anon_inode:[signalfd]
   sigmask:0200

  where 'sigmask' is hex value of the signal mask associated
@@ -1949,6 +1954,7 @@ Epoll files
   mnt_id: 9
   ino:63107
   size:   0
+ path:   anon_inode:[eventpoll]
   tfd:5 events:   1d data:  pos:0 ino:61af 
sdev:7

  where 'tfd' is a target file descriptor number in decimal form,
@@ -1968,6 +1974,7 @@ For inotify files the format is the following::
   mnt_id: 9
   ino:63107
   size:   0
+ path:   anon_inode:inotify
   inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 
fhandle-bytes:8 fhandle-type:1 f_handle:7e9e640d1b6d

  where 'wd' is a watch 

Re: [PATCH 2/2] procfs: Add 'path' to /proc//fdinfo/

2022-06-01 Thread Stephen Brennan
On 5/31/22 14:25, Kalesh Singh wrote:
> In order to identify the type of memory a process has pinned through
> its open fds, add the file path to fdinfo output. This allows
> identifying memory types based on common prefixes. e.g. "/memfd...",
> "/dmabuf...", "/dev/ashmem...".
> 
> Access to /proc//fdinfo is governed by PTRACE_MODE_READ_FSCREDS
> the same as /proc//maps which also exposes the file path of
> mappings; so the security permissions for accessing path is consistent
> with that of /proc//maps.

Hi Kalesh,

I think I see the value in the size field, but I'm curious about path,
which is available via readlink /proc//fd/, since those are
symlinks to the file themselves.

File paths can contain fun characters like newlines or colons, which
could make parsing out filenames in this text file... fun. How would your
userspace parsing logic handle "/home/stephen/filename\nsize:\t4096"? The
readlink(2) API makes that easy already.

Is the goal avoiding races (e.g. file descriptor 3 is closed and reopened
to a different path between reading fdinfo and stating the fd)?

Stephen

> Signed-off-by: Kalesh Singh 
> ---
> 
> Changes from rfc:
>   - Split adding 'size' and 'path' into a separate patches, per Christian
>   - Fix indentation (use tabs) in documentaion, per Randy
> 
>  Documentation/filesystems/proc.rst | 14 --
>  fs/proc/fd.c   |  4 
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.rst 
> b/Documentation/filesystems/proc.rst
> index 779c05528e87..591f12d30d97 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1886,14 +1886,16 @@ if precise results are needed.
>  3.8  /proc//fdinfo/ - Information about opened file
>  ---
>  This file provides information associated with an opened file. The regular
> -files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 
> 'size'.
> +files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
> +and 'path'.
>  
>  The 'pos' represents the current offset of the opened file in decimal
>  form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
>  file has been created with [see open(2) for details] and 'mnt_id' represents
>  mount ID of the file system containing the opened file [see 3.5
>  /proc//mountinfo for details]. 'ino' represents the inode number of
> -the file, and 'size' represents the size of the file in bytes.
> +the file, 'size' represents the size of the file in bytes, and 'path'
> +represents the file path.
>  
>  A typical output is::
>  
> @@ -1902,6 +1904,7 @@ A typical output is::
>   mnt_id: 19
>   ino:63107
>   size:   0
> + path:   /dev/null
>  
>  All locks associated with a file descriptor are shown in its fdinfo too::
>  
> @@ -1920,6 +1923,7 @@ Eventfd files
>   mnt_id: 9
>   ino:63107
>   size:   0
> + path:   anon_inode:[eventfd]
>   eventfd-count:  5a
>  
>  where 'eventfd-count' is hex value of a counter.
> @@ -1934,6 +1938,7 @@ Signalfd files
>   mnt_id: 9
>   ino:63107
>   size:   0
> + path:   anon_inode:[signalfd]
>   sigmask:0200
>  
>  where 'sigmask' is hex value of the signal mask associated
> @@ -1949,6 +1954,7 @@ Epoll files
>   mnt_id: 9
>   ino:63107
>   size:   0
> + path:   anon_inode:[eventpoll]
>   tfd:5 events:   1d data:  pos:0 ino:61af 
> sdev:7
>  
>  where 'tfd' is a target file descriptor number in decimal form,
> @@ -1968,6 +1974,7 @@ For inotify files the format is the following::
>   mnt_id: 9
>   ino:63107
>   size:   0
> + path:   anon_inode:inotify
>   inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 
> fhandle-bytes:8 fhandle-type:1 f_handle:7e9e640d1b6d
>  
>  where 'wd' is a watch descriptor in decimal form, i.e. a target file
> @@ -1992,6 +1999,7 @@ For fanotify files the format is::
>   mnt_id: 9
>   ino:63107
>   size:   0
> + path:   anon_inode:[fanotify]
>   fanotify flags:10 event-flags:0
>   fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:4003
>   fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:4000 
> fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
> @@ -2018,6 +2026,7 @@ Timerfd files
>   mnt_id: 9
>   ino:63107
>   size:   0
> + path:   anon_inode:[timerfd]
>   clockid: 0
>   ticks: 0
>   settime flags: 01
> @@ -2042,6 +2051,7 @@ DMA Buffer files
>   mnt_id: 9
>   ino:63107
>   size:   32768
> + path:   /dmabuf:
>   count:  2
>   exp_name:  system-heap
>  
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 464bc3f55759..8889a8ba09d4 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -60,6 +60,10 @@ static int seq_show(struct seq_file *m, void *v)
>  

Re: [PATCH 2/2] procfs: Add 'path' to /proc//fdinfo/

2022-06-01 Thread Stephen Brennan
Kalesh Singh  writes:
> On Tue, May 31, 2022 at 3:07 PM Stephen Brennan
>  wrote:
>>
>> On 5/31/22 14:25, Kalesh Singh wrote:
>> > In order to identify the type of memory a process has pinned through
>> > its open fds, add the file path to fdinfo output. This allows
>> > identifying memory types based on common prefixes. e.g. "/memfd...",
>> > "/dmabuf...", "/dev/ashmem...".
>> >
>> > Access to /proc//fdinfo is governed by PTRACE_MODE_READ_FSCREDS
>> > the same as /proc//maps which also exposes the file path of
>> > mappings; so the security permissions for accessing path is consistent
>> > with that of /proc//maps.
>>
>> Hi Kalesh,
>
> Hi Stephen,
>
> Thanks for taking a look.
>
>>
>> I think I see the value in the size field, but I'm curious about path,
>> which is available via readlink /proc//fd/, since those are
>> symlinks to the file themselves.
>
> This could work if we are root, but the file permissions wouldn't
> allow us to do the readlink on other processes otherwise. We want to
> be able to capture the system state in production environments from
> some trusted process with ptrace read capability.

Interesting, thanks for explaining. It seems weird to have a duplicate
interface for the same information but such is life.

>>
>> File paths can contain fun characters like newlines or colons, which
>> could make parsing out filenames in this text file... fun. How would your
>> userspace parsing logic handle "/home/stephen/filename\nsize:\t4096"? The
>> readlink(2) API makes that easy already.
>
> I think since we have escaped the "\n" (seq_file_path(m, file, "\n")),

I really should have read through that function before commenting,
thanks for teaching me something new :)

Stephen

> then user space might parse this line like:
>
> if (strncmp(line, "path:\t", 6) == 0)
> char* path = line + 6;
>
>
> Thanks,
> Kalesh
>
>>
>> Is the goal avoiding races (e.g. file descriptor 3 is closed and reopened
>> to a different path between reading fdinfo and stating the fd)?
>>
>> Stephen
>>
>> > Signed-off-by: Kalesh Singh 
>> > ---
>> >
>> > Changes from rfc:
>> >   - Split adding 'size' and 'path' into a separate patches, per Christian
>> >   - Fix indentation (use tabs) in documentaion, per Randy
>> >
>> >  Documentation/filesystems/proc.rst | 14 --
>> >  fs/proc/fd.c   |  4 
>> >  2 files changed, 16 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/Documentation/filesystems/proc.rst 
>> > b/Documentation/filesystems/proc.rst
>> > index 779c05528e87..591f12d30d97 100644
>> > --- a/Documentation/filesystems/proc.rst
>> > +++ b/Documentation/filesystems/proc.rst
>> > @@ -1886,14 +1886,16 @@ if precise results are needed.
>> >  3.8  /proc//fdinfo/ - Information about opened file
>> >  ---
>> >  This file provides information associated with an opened file. The regular
>> > -files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 
>> > 'size'.
>> > +files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
>> > +and 'path'.
>> >
>> >  The 'pos' represents the current offset of the opened file in decimal
>> >  form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
>> >  file has been created with [see open(2) for details] and 'mnt_id' 
>> > represents
>> >  mount ID of the file system containing the opened file [see 3.5
>> >  /proc//mountinfo for details]. 'ino' represents the inode number of
>> > -the file, and 'size' represents the size of the file in bytes.
>> > +the file, 'size' represents the size of the file in bytes, and 'path'
>> > +represents the file path.
>> >
>> >  A typical output is::
>> >
>> > @@ -1902,6 +1904,7 @@ A typical output is::
>> >   mnt_id: 19
>> >   ino:63107
>> >   size:   0
>> > + path:   /dev/null
>> >
>> >  All locks associated with a file descriptor are shown in its fdinfo too::
>> >
>> > @@ -1920,6 +1923,7 @@ Eventfd files
>> >   mnt_id: 9
>> >   ino:63107
>> >   size:   0
>> > + path:   anon_inode:[eventfd]
>> >   eventfd-count:  5a
>> >
>> >  where 'eventfd-count' is hex value of a counter.
>> > @@ -1934,6 +1938,7 @@ Signalfd files
>> >   mnt_id: 9
>> >   ino:63107
>> >   size:   0
>> > + path:   anon_inode:[signalfd]
>> >   sigmask:0200
>> >
>> >  where 'sigmask' is hex value of the signal mask associated
>> > @@ -1949,6 +1954,7 @@ Epoll files
>> >   mnt_id: 9
>> >   ino:63107
>> >   size:   0
>> > + path:   anon_inode:[eventpoll]
>> >   tfd:5 events:   1d data:  pos:0 ino:61af 
>> > sdev:7
>> >
>> >  where 'tfd' is a target file descriptor number in decimal form,
>> > @@ -1968,6 +1974,7 @@ For inotify files the format is the following::
>> >   mnt_id: 9
>> >   ino:63107
>> >   size:   0
>> > + path:   anon_inode:inotify
>> >   inotify wd:3 ino:9e7e