Re: [PATCH v3 2/2] dmabuf: Add dmabuf inode number to /proc/*/fdinfo

2021-02-05 Thread Kalesh Singh
On Fri, Feb 5, 2021 at 2:56 AM Christian König  wrote:
>
> Am 05.02.21 um 03:23 schrieb Kalesh Singh:
> > If a FD refers to a DMA buffer add the DMA buffer inode number to
> > /proc//fdinfo/ and /proc//task//fdindo/.
> >
> > The dmabuf inode number allows userspace to uniquely identify the buffer
> > and avoids a dependency on /proc//fd/* when accounting per-process
> > DMA buffer sizes.
>
> BTW: Why do we make this DMA-buf specific? Couldn't we always print the
> inode number for all fds?

Good point. We can make this a common field instead of DMA buf
specific. I will update in the next version.

Thanks,
Kalesh
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Kalesh Singh 
> > ---
> > Changes in v3:
> >- Add documentation in proc.rst
> > Changes in v2:
> >- Update patch description
> >
> >   Documentation/filesystems/proc.rst | 17 +
> >   drivers/dma-buf/dma-buf.c  |  1 +
> >   2 files changed, 18 insertions(+)
> >
> > diff --git a/Documentation/filesystems/proc.rst 
> > b/Documentation/filesystems/proc.rst
> > index 2fa69f710e2a..fdd38676f57f 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -2031,6 +2031,23 @@ details]. 'it_value' is remaining time until the 
> > timer expiration.
> >   with TIMER_ABSTIME option which will be shown in 'settime flags', but 
> > 'it_value'
> >   still exhibits timer's remaining time.
> >
> > +DMA Buffer files
> > +
> > +
> > +::
> > +
> > + pos:0
> > + flags:  04002
> > + mnt_id: 9
> > + dmabuf_inode_no: 63107
> > + size:   32768
> > + count:  2
> > + exp_name:  system-heap
> > +
> > +where 'dmabuf_inode_no' is the unique inode number of the DMA buffer file.
> > +'size' is the size of the DMA buffer in bytes. 'count' is the file count of
> > +the DMA buffer file. 'exp_name' is the name of the DMA buffer exporter.
> > +
> >   3.9 /proc//map_files - Information about memory mapped files
> >   -
> >   This directory contains symbolic links which represent memory mapped files
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 9ad6397aaa97..d869099ede83 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -414,6 +414,7 @@ static void dma_buf_show_fdinfo(struct seq_file *m, 
> > struct file *file)
> >   {
> >   struct dma_buf *dmabuf = file->private_data;
> >
> > + seq_printf(m, "dmabuf_inode_no:\t%lu\n", file_inode(file)->i_ino);
> >   seq_printf(m, "size:\t%zu\n", dmabuf->size);
> >   /* Don't count the temporary reference taken inside procfs seq_show */
> >   seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 1/2] procfs: Allow reading fdinfo with PTRACE_MODE_READ

2021-02-05 Thread Kalesh Singh
Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers. In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting. Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to
a DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc//fd/* and
/proc//fdinfo -- both are only readable by the process owner,
as follows:
  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc//fdinfo/, to get the DMA buffer size.

Accessing other processes’ fdinfo requires root privileges. This limits
the use of the interface to debugging environments and is not suitable
for production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

Since fdinfo doesn't permit reading process memory and manipulating
process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.

Suggested-by: Jann Horn 
Signed-off-by: Kalesh Singh 
---

Changes in v2:
  - Update patch desciption

 fs/proc/base.c |  4 ++--
 fs/proc/fd.c   | 15 ++-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b3422cda2a91..a37f9de7103f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3160,7 +3160,7 @@ static const struct pid_entry tgid_base_stuff[] = {
DIR("task",   S_IRUGO|S_IXUGO, proc_task_inode_operations, 
proc_task_operations),
DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, 
proc_map_files_operations),
-   DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
+   DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
 #ifdef CONFIG_NET
DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
@@ -3504,7 +3504,7 @@ static const struct inode_operations 
proc_tid_comm_inode_operations = {
  */
 static const struct pid_entry tid_base_stuff[] = {
DIR("fd",S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
-   DIR("fdinfo",S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
+   DIR("fdinfo",S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns",S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
 #ifdef CONFIG_NET
DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index cb51763ed554..585e213301f9 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,18 @@ static int seq_show(struct seq_file *m, void *v)
 
 static int seq_fdinfo_open(struct inode *inode, struct file *file)
 {
+   bool allowed = false;
+   struct task_struct *task = get_proc_task(inode);
+
+   if (!task)
+   return -ESRCH;
+
+   allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
+   put_task_struct(task);
+
+   if (!allowed)
+   return -EACCES;
+
return single_open(file, seq_show, inode);
 }
 
@@ -307,7 +320,7 @@ static struct dentry *proc_fdinfo_instantiate(struct dentry 
*dentry,
struct proc_inode *ei;
struct inode *inode;
 
-   inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR);
+   inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO);
if (!inode)
return ERR_PTR(-ENOENT);
 
-- 
2.30.0.365.g02bc693789-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3 1/2] procfs: Allow reading fdinfo with PTRACE_MODE_READ

2021-02-05 Thread Kalesh Singh
Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers. In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting. Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to
a DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc//fd/* and
/proc//fdinfo -- both are only readable by the process owner,
as follows:
  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc//fdinfo/, to get the DMA buffer size.

Accessing other processes’ fdinfo requires root privileges. This limits
the use of the interface to debugging environments and is not suitable
for production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

Since fdinfo doesn't permit reading process memory and manipulating
process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.

Suggested-by: Jann Horn 
Signed-off-by: Kalesh Singh 
---
Changes in v2:
  - Update patch description

 fs/proc/base.c |  4 ++--
 fs/proc/fd.c   | 15 ++-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b3422cda2a91..a37f9de7103f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3160,7 +3160,7 @@ static const struct pid_entry tgid_base_stuff[] = {
DIR("task",   S_IRUGO|S_IXUGO, proc_task_inode_operations, 
proc_task_operations),
DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, 
proc_map_files_operations),
-   DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
+   DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
 #ifdef CONFIG_NET
DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
@@ -3504,7 +3504,7 @@ static const struct inode_operations 
proc_tid_comm_inode_operations = {
  */
 static const struct pid_entry tid_base_stuff[] = {
DIR("fd",S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
-   DIR("fdinfo",S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
+   DIR("fdinfo",S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns",S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
 #ifdef CONFIG_NET
DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index cb51763ed554..585e213301f9 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,18 @@ static int seq_show(struct seq_file *m, void *v)
 
 static int seq_fdinfo_open(struct inode *inode, struct file *file)
 {
+   bool allowed = false;
+   struct task_struct *task = get_proc_task(inode);
+
+   if (!task)
+   return -ESRCH;
+
+   allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
+   put_task_struct(task);
+
+   if (!allowed)
+   return -EACCES;
+
return single_open(file, seq_show, inode);
 }
 
@@ -307,7 +320,7 @@ static struct dentry *proc_fdinfo_instantiate(struct dentry 
*dentry,
struct proc_inode *ei;
struct inode *inode;
 
-   inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR);
+   inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO);
if (!inode)
return ERR_PTR(-ENOENT);
 
-- 
2.30.0.478.g8a0d178c01-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/2] Allow reading process DMA buf stats from fdinfo

2021-02-04 Thread Kalesh Singh
On Thu, Jan 28, 2021 at 1:24 PM Kalesh Singh  wrote:
>
> Android captures per-process system memory state when certain low memory
> events (e.g a foreground app kill) occur, to identify potential memory
> hoggers. In order to measure how much memory a process actually consumes,
> it is necessary to include the DMA buffer sizes for that process in the
> memory accounting. Since the handle to DMA buffers are raw FDs, it is
> important to be able to identify which processes have FD references to
> a DMA buffer.
>
> Currently, DMA buffer FDs can be accounted using /proc//fd/* and
> /proc//fdinfo -- both are only readable by the process owner,
> as follows:
>   1. Do a readlink on each FD.
>   2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
>   3. stat the file to get the dmabuf inode number.
>   4. Read/ proc//fdinfo/, to get the DMA buffer size.
>
> Accessing other processes’ fdinfo requires root privileges. This limits
> the use of the interface to debugging environments and is not suitable
> for production builds.  Granting root privileges even to a system process
> increases the attack surface and is highly undesirable.
>
> This series proposes making the requirement to read fdinfo less strict
> with PTRACE_MODE_READ.
>

Hi everyone,

I will send v2 of this patch series. Please let me know if you have
any other comments or feedback, that should be addressed in the new
version.

Thanks,
Kalesh

> Kalesh Singh (2):
>   procfs: Allow reading fdinfo with PTRACE_MODE_READ
>   dmabuf: Add dmabuf inode no to fdinfo
>
>  drivers/dma-buf/dma-buf.c |  1 +
>  fs/proc/base.c|  4 ++--
>  fs/proc/fd.c  | 15 ++-
>  3 files changed, 17 insertions(+), 3 deletions(-)
>
> --
> 2.30.0.365.g02bc693789-goog
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/2] Allow reading process DMA buf stats from fdinfo

2021-01-28 Thread Kalesh Singh
Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers. In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting. Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to
a DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc//fd/* and
/proc//fdinfo -- both are only readable by the process owner,
as follows:
  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc//fdinfo/, to get the DMA buffer size.

Accessing other processes’ fdinfo requires root privileges. This limits
the use of the interface to debugging environments and is not suitable
for production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

This series proposes making the requirement to read fdinfo less strict
with PTRACE_MODE_READ.


Kalesh Singh (2):
  procfs: Allow reading fdinfo with PTRACE_MODE_READ
  dmabuf: Add dmabuf inode no to fdinfo

 drivers/dma-buf/dma-buf.c |  1 +
 fs/proc/base.c|  4 ++--
 fs/proc/fd.c  | 15 ++-
 3 files changed, 17 insertions(+), 3 deletions(-)

-- 
2.30.0.365.g02bc693789-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] dmabuf: Add dmabuf inode no to fdinfo

2021-01-28 Thread Kalesh Singh
The dmabuf inode number allows userspace to uniquely identify the buffer
and avoids a dependency on /proc//fd/* when accounting per-process
DMA buffer sizes.

Signed-off-by: Kalesh Singh 
---
 drivers/dma-buf/dma-buf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 9ad6397aaa97..d869099ede83 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -414,6 +414,7 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct 
file *file)
 {
struct dma_buf *dmabuf = file->private_data;
 
+   seq_printf(m, "dmabuf_inode_no:\t%lu\n", file_inode(file)->i_ino);
seq_printf(m, "size:\t%zu\n", dmabuf->size);
/* Don't count the temporary reference taken inside procfs seq_show */
seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
-- 
2.30.0.365.g02bc693789-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] procfs/dmabuf: Add /proc//task//dmabuf_fds

2021-01-28 Thread Kalesh Singh
On Wed, Jan 27, 2021 at 5:47 AM Jann Horn  wrote:
>
> +jeffv from Android
>
> On Tue, Jan 26, 2021 at 11:51 PM Kalesh Singh  wrote:
> > In order to measure how much memory a process actually consumes, it is
> > necessary to include the DMA buffer sizes for that process in the memory
> > accounting. Since the handle to DMA buffers are raw FDs, it is important
> > to be able to identify which processes have FD references to a DMA buffer.
>
> Or you could try to let the DMA buffer take a reference on the
> mm_struct and account its size into the mm_struct? That would probably
> be nicer to work with than having to poke around in procfs separately
> for DMA buffers.
>
> > Currently, DMA buffer FDs can be accounted using /proc//fd/* and
> > /proc//fdinfo -- both of which are only root readable, as follows:
>
> That's not quite right. They can both also be accessed by the user
> owning the process. Also, fdinfo is a standard interface for
> inspecting process state that doesn't permit reading process memory or
> manipulating process state - so I think it would be fine to permit
> access to fdinfo under a PTRACE_MODE_READ_FSCRED check, just like the
> interface you're suggesting.


Hi everyone. Thank you for the feedback.

I understand there is a deeper problem of accounting shared memory in
the kernel, that’s not only specific to the DMA buffers. In this case
DMA buffers, I think Jann’s proposal is the cleanest way to attribute
the shared buffers to processes. I can respin a patch modifying fdinfo
as suggested, if this is not an issue from a security perspective.

Thanks,
Kalesh

>
>
> >   1. Do a readlink on each FD.
> >   2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
> >   3. stat the file to get the dmabuf inode number.
> >   4. Read/ proc//fdinfo/, to get the DMA buffer size.
> >
> > Android captures per-process system memory state when certain low memory
> > events (e.g a foreground app kill) occur, to identify potential memory
> > hoggers. To include a process’s dmabuf usage as part of its memory state,
> > the data collection needs to be fast enough to reflect the memory state at
> > the time of such events.
> >
> > Since reading /proc//fd/ and /proc//fdinfo/ requires root
> > privileges, this approach is not suitable for production builds.
>
> It should be easy to add enough information to /proc//fdinfo/ so
> that you don't need to look at /proc//fd/ anymore.
>
> > Granting
> > root privileges even to a system process increases the attack surface and
> > is highly undesirable. Additionally this is slow as it requires many
> > context switches for searching and getting the dma-buf info.
>
> What do you mean by "context switches"? Task switches or kernel/user
> transitions (e.g. via syscall)?
>
> > With the addition of per-buffer dmabuf stats in sysfs [1], the DMA buffer
> > details can be queried using their unique inode numbers.
> >
> > This patch proposes adding a /proc//task//dmabuf_fds interface.
> >
> > /proc//task//dmabuf_fds contains a list of inode numbers for
> > every DMA buffer FD that the task has. Entries with the same inode
> > number can appear more than once, indicating the total FD references
> > for the associated DMA buffer.
> >
> > If a thread shares the same files as the group leader then its
> > dmabuf_fds file will be empty, as these dmabufs are reported by the
> > group leader.
> >
> > The interface requires PTRACE_MODE_READ_FSCRED (same as /proc//maps)
> > and allows the efficient accounting of per-process DMA buffer usage without
> > requiring root privileges. (See data below)
>
> I'm not convinced that introducing a new procfs file for this is the
> right way to go. And the idea of having to poke into multiple
> different files in procfs and in sysfs just to be able to compute a
> proper memory usage score for a process seems weird to me. "How much
> memory is this process using" seems like the kind of question the
> kernel ought to be able to answer (and the kernel needs to be able to
> answer somewhat accurately so that its own OOM killer can do its job
> properly)?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] procfs/dmabuf: Add /proc//task//dmabuf_fds

2021-01-27 Thread Kalesh Singh
In order to measure how much memory a process actually consumes, it is
necessary to include the DMA buffer sizes for that process in the memory
accounting. Since the handle to DMA buffers are raw FDs, it is important
to be able to identify which processes have FD references to a DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc//fd/* and
/proc//fdinfo -- both of which are only root readable, as follows:
  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc//fdinfo/, to get the DMA buffer size.

Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers. To include a process’s dmabuf usage as part of its memory state,
the data collection needs to be fast enough to reflect the memory state at
the time of such events.

Since reading /proc//fd/ and /proc//fdinfo/ requires root
privileges, this approach is not suitable for production builds. Granting
root privileges even to a system process increases the attack surface and
is highly undesirable. Additionally this is slow as it requires many
context switches for searching and getting the dma-buf info.

With the addition of per-buffer dmabuf stats in sysfs [1], the DMA buffer
details can be queried using their unique inode numbers.

This patch proposes adding a /proc//task//dmabuf_fds interface.

/proc//task//dmabuf_fds contains a list of inode numbers for
every DMA buffer FD that the task has. Entries with the same inode
number can appear more than once, indicating the total FD references
for the associated DMA buffer.

If a thread shares the same files as the group leader then its
dmabuf_fds file will be empty, as these dmabufs are reported by the
group leader.

The interface requires PTRACE_MODE_READ_FSCRED (same as /proc//maps)
and allows the efficient accounting of per-process DMA buffer usage without
requiring root privileges. (See data below)

Performance Comparison:
---

The following data compares the time to capture the sizes of all DMA
buffers referenced by FDs for all processes on an arm64 android device.

---
   |  Core 0 (Little)  |  Core 7 (Big) |
---
From /fdinfo  |  318 ms   | 145 ms|
---
Inodes from|  114 ms   |  27 ms|
dmabuf_fds;|(2.8x  ^)  |   (5.4x  ^)   |
data from sysfs|   |   |
---

It can be inferred that in the worst case there is a 2.8x speedup for
determining per-process DMA buffer FD sizes, when using the proposed
interfaces.

[1] https://lore.kernel.org/dri-devel/20210119225723.33-1-hri...@google.com/

Signed-off-by: Kalesh Singh 
---
 Documentation/filesystems/proc.rst |  30 ++
 drivers/dma-buf/dma-buf.c  |   7 +-
 fs/proc/Makefile   |   1 +
 fs/proc/base.c |   1 +
 fs/proc/dma_bufs.c | 159 +
 fs/proc/internal.h |   1 +
 include/linux/dma-buf.h|   5 +
 7 files changed, 198 insertions(+), 6 deletions(-)
 create mode 100644 fs/proc/dma_bufs.c

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 2fa69f710e2a..757dd47ab679 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -47,6 +47,7 @@ fixes/update part 1.1  Stefani Seibold   
  June 9 2009
   3.10  /proc//timerslack_ns - Task timerslack value
   3.11 /proc//patch_state - Livepatch patch operation state
   3.12 /proc//arch_status - Task architecture specific information
+  3.13 /proc//task//dmabuf_fds - DMA buffers referenced by an FD
 
   4Configuring procfs
   4.1  Mount options
@@ -2131,6 +2132,35 @@ AVX512_elapsed_ms
   the task is unlikely an AVX512 user, but depends on the workload and the
   scheduling scenario, it also could be a false negative mentioned above.
 
+3.13 /proc//task//dmabuf_fds - DMA buffers referenced by an FD
+-
+This file  exposes a list of the inode numbers for every DMA buffer
+FD that the task has.
+
+The same inode number can appear more than once, indicating the total
+FD references for the associated DMA buffer.
+
+The inode number can be used to lookup the DMA buffer information in
+the sysfs interface /sys/kernel/dmabuf/buffers//.
+
+Example Output
+~~
+$ cat /proc/612/task/612/dmabuf_fds
+30972 30973 45678 49326
+
+Permission to access this file is governed by a ptrace access mode
+PTRACE_MODE_READ_FSCREDS.
+
+Threads can have different files when created without specifying
+the CLONE_

Re: [PATCH 2/2] dmabuf/tracing: Add dma-buf trace events

2020-08-05 Thread Kalesh Singh
On Tue, Aug 04, 2020 at 02:09:13AM +0100, Al Viro wrote:
> On Mon, Aug 03, 2020 at 11:28:31PM +0100, Al Viro wrote:
> 
> > IOW, what the hell is that horror for?  You do realize, for example, that 
> > there's
> > such thing as dup(), right?  And dup2() as well.  And while we are at it, 
> > how
> > do you keep track of removals, considering the fact that you can stick a 
> > file
> > reference into SCM_RIGHTS datagram sent to yourself, close descriptors and 
> > an hour
> > later pick that datagram, suddenly getting descriptor back?
> > 
> > Besides, "I have no descriptors left" != "I can't be currently sitting in 
> > the middle
> > of syscall on that sucker"; close() does *NOT* terminate ongoing operations.
> > 
> > You are looking at the drastically wrong abstraction level.  Please, 
> > describe what
> > it is that you are trying to achieve.

Hi Al. Thank you for the comments. Ultimately what we need is to identify 
processes
that hold a file reference to the dma-buf. Unfortunately we can't use only
explicit dma_buf_get/dma_buf_put to track them because when an FD is being 
shared
between processes the file references are taken implicitly.

For example, on the sender side:
   unix_dgram_sendmsg -> send_scm -> __send_scm -> scm_fp_copy -> fget_raw
and on the receiver side:
   unix_dgram_recvmsg -> scm_recv -> scm_detach_fds -> __scm_install_fd -> 
get_file

I understand now that fd_install is not an appropriate abstraction level to 
track these.
Is there a more appropriate alternative where we could use to track these 
implicit file
references?

> _IF_ it's "who keeps a particularly long-lived sucker pinned", I would suggest
> fuser(1) run when you detect that kind of long-lived dmabuf.  With events 
> generated
> by their constructors and destructors, and detection of longevity done based 
> on
> that.
> 
> But that's only a semi-blind guess at the things you are trying to achieve; 
> please,
> describe what it really is.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] dmabuf/tracing: Add dma-buf trace events

2020-08-04 Thread Kalesh Singh
On Mon, Aug 03, 2020 at 11:32:39AM -0400, Steven Rostedt wrote:
> On Mon,  3 Aug 2020 14:47:19 +
> Kalesh Singh  wrote:
> 
> > +DECLARE_EVENT_CLASS(dma_buf_ref_template,
> > +
> > +   TP_PROTO(struct task_struct *task, struct file *filp),
> > +
> > +   TP_ARGS(task,  filp),
> > +
> > +   TP_STRUCT__entry(
> > +   __field(u32, tgid)
> > +   __field(u32, pid)
> 
> I only see "current" passed in as "task". Why are you recording the pid
> and tgid as these are available by the tracing infrastructure.
> 
> At least the pid is saved at every event. You can see the tgid when
> enabling the "record_tgid".
> 
>  # trace-cmd start -e all -O record_tgid
>  # trace-cmd show
> 
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 39750/39750   #P:8
> #
> #  _-=> irqs-off
> # / _=> need-resched
> #| / _---=> hardirq/softirq
> #|| / _--=> preempt-depth
> #||| / delay
> #   TASK-PIDTGID   CPU#  TIMESTAMP  FUNCTION
> #  | ||  |      | |
>trace-cmd-28284 (28284) [005]  240338.934671: sys_exit: NR 1 = 1
>  kworker/3:2-27891 (27891) [003] d... 240338.934671: timer_start: 
> timer=d643debd function=delayed_work_timer_fn expires=4535008893 
> [timeout=1981] cpu=3 idx=186 flags=I
>trace-cmd-28284 (28284) [005]  240338.934672: sys_write -> 0x1
>  kworker/3:2-27891 (27891) [003]  240338.934672: 
> workqueue_execute_end: work struct 8fddd403: function psi_avgs_work
>  kworker/3:2-27891 (27891) [003]  240338.934673: 
> workqueue_execute_start: work struct 111c941e: function 
> dbs_work_handler
>  kworker/3:2-27891 (27891) [003]  240338.934673: 
> workqueue_execute_end: work struct 111c941e: function dbs_work_handler
>  kworker/3:2-27891 (27891) [003] d... 240338.934673: rcu_utilization: 
> Start context switch
>  kworker/3:2-27891 (27891) [003] d... 240338.934673: rcu_utilization: End 
> context switch
> 
> -- Steve
> 
Thanks for the comments Steve. I'll remove the task arg.

> > +   __field(u64, size)
> > +   __field(s64, count)
> > +   __string(exp_name, dma_buffer(filp)->exp_name)
> > +   __string(name, dma_buffer(filp)->name ? dma_buffer(filp)->name 
> > : UNKNOWN)
> > +   __field(u64, i_ino)
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   __entry->tgid = task->tgid;
> > +   __entry->pid = task->pid;
> > +   __entry->size = dma_buffer(filp)->size;
> > +   __entry->count = file_count(filp);
> > +   __assign_str(exp_name, dma_buffer(filp)->exp_name);
> > +   __assign_str(name, dma_buffer(filp)->name ? 
> > dma_buffer(filp)->name : UNKNOWN);
> > +   __entry->i_ino = filp->f_inode->i_ino;
> > +   ),
> > +
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/2] fs: Add fd_install file operation

2020-08-04 Thread Kalesh Singh
On Mon, Aug 03, 2020 at 05:34:29PM +0100, Christoph Hellwig wrote:
> On Mon, Aug 03, 2020 at 02:47:18PM +0000, Kalesh Singh wrote:
> > Provides a per process hook for the acquisition of file descriptors,
> > despite the method used to obtain the descriptor.
> > 
> > Signed-off-by: Kalesh Singh 
> 
> I strongly disagree with this.  The file operation has no business
> hooking into installing the fd.
Hi Christoph. I am exploring the alternative suggested by Matthew in
Patch 2/2. Thanks :)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel