Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

2020-07-13 Thread Matt Bennett
On Tue, 2020-07-14 at 15:03 +1000, Aleksa Sarai wrote:
> On 2020-07-13, Eric W. Biederman  wrote:
> > Matt Bennett  writes:
> > 
> > > On Thu, 2020-07-02 at 21:10 +0200, Christian Brauner wrote:
> > > > On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> > > > > Matt Bennett  writes:
> > > > > 
> > > > > > Previously the connector functionality could only be used by 
> > > > > > processes running in the
> > > > > > default network namespace. This meant that any process that uses 
> > > > > > the connector functionality
> > > > > > could not operate correctly when run inside a container. This is a 
> > > > > > draft patch series that
> > > > > > attempts to now allow this functionality outside of the default 
> > > > > > network namespace.
> > > > > > 
> > > > > > I see this has been discussed previously [1], but am not sure how 
> > > > > > my changes relate to all
> > > > > > of the topics discussed there and/or if there are any unintended 
> > > > > > side effects from my draft
> > > > > > changes.
> > > > > 
> > > > > Is there a piece of software that uses connector that you want to get
> > > > > working in containers?
> > > 
> > > We have an IPC system [1] where processes can register their socket
> > > details (unix, tcp, tipc, ...) to a 'monitor' process. Processes can
> > > then get notified when other processes they are interested in
> > > start/stop their servers and use the registered details to connect to
> > > them. Everything works unless a process crashes, in which case the
> > > monitoring process never removes their details. Therefore the
> > > monitoring process uses the connector functionality with
> > > PROC_EVENT_EXIT to detect when a process crashes and removes the
> > > details if it is a previously registered PID.
> > > 
> > > This was working for us until we tried to run our system in a container.
> > > 
> > > > > 
> > > > > I am curious what the motivation is because up until now there has 
> > > > > been
> > > > > nothing very interesting using this functionality.  So it hasn't been
> > > > > worth anyone's time to make the necessary changes to the code.
> > > > 
> > > > Imho, we should just state once and for all that the proc connector will
> > > > not be namespaced. This is such a corner-case thing and has been
> > > > non-namespaced for such a long time without consistent push for it to be
> > > > namespaced combined with the fact that this needs quite some code to
> > > > make it work correctly that I fear we end up buying more bugs than we're
> > > > selling features. And realistically, you and I will end up maintaining
> > > > this and I feel this is not worth the time(?). Maybe I'm being too
> > > > pessimistic though.
> > > > 
> > > 
> > > Fair enough. I can certainly look for another way to detect process
> > > crashes. Interestingly I found a patch set [2] on the mailing list
> > > that attempts to solve the problem I wish to solve, but it doesn't
> > > look like the patches were ever developed further. From reading the
> > > discussion thread on that patch set it appears that I should be doing
> > > some form of polling on the /proc files.
> > 
> > Recently Christian Brauner implemented pidfd complete with a poll
> > operation that reports when a process terminates.
> > 
> > If you are willing to change your userspace code switching to pidfd
> > should be all that you need.
> 
> While this does solve the problem of getting exit notifications in
> general, you cannot get the exit code. But if they don't care about that
> then we can solve that problem another time. :D
> 

From first glance using pidfd will do exactly what we need. Not being able
to get the exit code will not be an issue. In fact I think it will be an
improvement over the connector as the listener will now only be waiting for
the PIDs we actually care about - rather than getting woken up on every single
process exit and having to check if it cares about the PID.

Many thanks Eric and others,
Matt




Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

2020-07-05 Thread Matt Bennett
On Thu, 2020-07-02 at 21:10 +0200, Christian Brauner wrote:
> On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> > Matt Bennett  writes:
> > 
> > > Previously the connector functionality could only be used by processes 
> > > running in the
> > > default network namespace. This meant that any process that uses the 
> > > connector functionality
> > > could not operate correctly when run inside a container. This is a draft 
> > > patch series that
> > > attempts to now allow this functionality outside of the default network 
> > > namespace.
> > > 
> > > I see this has been discussed previously [1], but am not sure how my 
> > > changes relate to all
> > > of the topics discussed there and/or if there are any unintended side 
> > > effects from my draft
> > > changes.
> > 
> > Is there a piece of software that uses connector that you want to get
> > working in containers?

We have an IPC system [1] where processes can register their socket details 
(unix, tcp, tipc, ...) to a 'monitor' process. Processes can then get
notified  when other processes they are interested in start/stop their servers 
and use the registered details to connect to them. Everything works
unless a process crashes, in which case the monitoring process never removes 
their details. Therefore the monitoring process uses the connector
functionality with PROC_EVENT_EXIT to detect when a process crashes and removes 
the details if it is a previously registered PID.

This was working for us until we tried to run our system in a container.

> > 
> > I am curious what the motivation is because up until now there has been
> > nothing very interesting using this functionality.  So it hasn't been
> > worth anyone's time to make the necessary changes to the code.
> 
> Imho, we should just state once and for all that the proc connector will
> not be namespaced. This is such a corner-case thing and has been
> non-namespaced for such a long time without consistent push for it to be
> namespaced combined with the fact that this needs quite some code to
> make it work correctly that I fear we end up buying more bugs than we're
> selling features. And realistically, you and I will end up maintaining
> this and I feel this is not worth the time(?). Maybe I'm being too
> pessimistic though.
> 

Fair enough. I can certainly look for another way to detect process crashes. 
Interestingly I found a patch set [2] on the mailing list that attempts
to solve the problem I wish to solve, but it doesn't look like the patches were 
ever developed further. From reading the discussion thread on that
patch set it appears that I should be doing some form of polling on the /proc 
files.

Best regards,
Matt

[1] 
https://github.com/alliedtelesis/cmsg/blob/master/cmsg/src/service_listener/netlink.c#L61
[2] https://lkml.org/lkml/2018/10/29/638



Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

2020-07-05 Thread Matt Bennett
On Thu, 2020-07-02 at 13:59 -0500, Eric W. Biederman wrote:
> Matt Bennett  writes:
> 
> > Previously the connector functionality could only be used by processes 
> > running in the
> > default network namespace. This meant that any process that uses the 
> > connector functionality
> > could not operate correctly when run inside a container. This is a draft 
> > patch series that
> > attempts to now allow this functionality outside of the default network 
> > namespace.
> > 
> > I see this has been discussed previously [1], but am not sure how my 
> > changes relate to all
> > of the topics discussed there and/or if there are any unintended side
> > effects from my draft
> 
> In a quick skim this patchset does not look like it approaches a correct
> conversion to having code that works in multiple namespaces.
> 
> I will take the changes to proc_id_connector for example.
> You report the values in the callers current namespaces.
> 
> Which means an unprivileged user can create a user namespace and get
> connector to report whichever ids they want to users in another
> namespace.  AKA lie.
> 
> So this appears to make connector completely unreliable.
> 
> Eric
> 

Hi Eric,

Thank you for taking the time to review. I wrote these patches in an attempt to 
show that I was willing to do the work myself rather than simply
asking for someone else to do it for me. The changes worked for my use cases 
when I tested them, but I expected that some of the changes would be
incorrect and that I would need some guidance. I can spend some time to really 
dig in and fully understand the changes I am trying to make (I have
limited kernel development experience) but based on the rest of the discussion 
threads it seems that there is likely no appetite to ever support
namespaces with the connector.

Best regards,
Matt


[PATCH 1/5] connector: Use task pid helpers

2020-07-01 Thread Matt Bennett
In preparation for supporting the connector outside of the default
network namespace we switch to using these helpers now. As the connector
is still only supported in the default namespace this change is a no-op.

Signed-off-by: Matt Bennett 
---
 drivers/connector/cn_proc.c | 48 ++---
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 646ad385e490..36a7823c56ec 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -83,11 +83,11 @@ void proc_fork_connector(struct task_struct *task)
ev->what = PROC_EVENT_FORK;
rcu_read_lock();
parent = rcu_dereference(task->real_parent);
-   ev->event_data.fork.parent_pid = parent->pid;
-   ev->event_data.fork.parent_tgid = parent->tgid;
+   ev->event_data.fork.parent_pid = task_pid_vnr(parent);
+   ev->event_data.fork.parent_tgid = task_tgid_vnr(parent);
rcu_read_unlock();
-   ev->event_data.fork.child_pid = task->pid;
-   ev->event_data.fork.child_tgid = task->tgid;
+   ev->event_data.fork.child_pid = task_pid_vnr(task);
+   ev->event_data.fork.child_tgid = task_tgid_vnr(task);
 
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -110,8 +110,8 @@ void proc_exec_connector(struct task_struct *task)
memset(>event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_EXEC;
-   ev->event_data.exec.process_pid = task->pid;
-   ev->event_data.exec.process_tgid = task->tgid;
+   ev->event_data.exec.process_pid = task_pid_vnr(task);
+   ev->event_data.exec.process_tgid = task_tgid_vnr(task);
 
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -134,8 +134,8 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
ev = (struct proc_event *)msg->data;
memset(>event_data, 0, sizeof(ev->event_data));
ev->what = which_id;
-   ev->event_data.id.process_pid = task->pid;
-   ev->event_data.id.process_tgid = task->tgid;
+   ev->event_data.id.process_pid = task_pid_vnr(task);
+   ev->event_data.id.process_tgid = task_tgid_vnr(task);
rcu_read_lock();
cred = __task_cred(task);
if (which_id == PROC_EVENT_UID) {
@@ -172,8 +172,8 @@ void proc_sid_connector(struct task_struct *task)
memset(>event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_SID;
-   ev->event_data.sid.process_pid = task->pid;
-   ev->event_data.sid.process_tgid = task->tgid;
+   ev->event_data.sid.process_pid = task_pid_vnr(task);
+   ev->event_data.sid.process_tgid = task_tgid_vnr(task);
 
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -196,11 +196,11 @@ void proc_ptrace_connector(struct task_struct *task, int 
ptrace_id)
memset(>event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_PTRACE;
-   ev->event_data.ptrace.process_pid  = task->pid;
-   ev->event_data.ptrace.process_tgid = task->tgid;
+   ev->event_data.ptrace.process_pid  = task_pid_vnr(task);
+   ev->event_data.ptrace.process_tgid = task_tgid_vnr(task);
if (ptrace_id == PTRACE_ATTACH) {
-   ev->event_data.ptrace.tracer_pid  = current->pid;
-   ev->event_data.ptrace.tracer_tgid = current->tgid;
+   ev->event_data.ptrace.tracer_pid  = task_pid_vnr(current);
+   ev->event_data.ptrace.tracer_tgid = task_tgid_vnr(current);
} else if (ptrace_id == PTRACE_DETACH) {
ev->event_data.ptrace.tracer_pid  = 0;
ev->event_data.ptrace.tracer_tgid = 0;
@@ -228,8 +228,8 @@ void proc_comm_connector(struct task_struct *task)
memset(>event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_COMM;
-   ev->event_data.comm.process_pid  = task->pid;
-   ev->event_data.comm.process_tgid = task->tgid;
+   ev->event_data.comm.process_pid  = task_pid_vnr(task);
+   ev->event_data.comm.process_tgid = task_tgid_vnr(task);
get_task_comm(ev->event_data.comm.comm, task);
 
memcpy(>id, _proc_event_id, sizeof(msg->id));
@@ -254,14 +254,14 @@ void proc_coredump_connector(struct task_struct *task)
memset(>event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_COREDUMP;
-   ev->event_data.coredump.process_pid = task->pid;
-   ev->event_

[PATCH 5/5] connector: Create connector per namespace

2020-07-01 Thread Matt Bennett
Move to storing the connector instance per network namespace. In doing
so the ability to use the connector functionality outside the default
namespace is now available.

Signed-off-by: Matt Bennett 
---
 drivers/connector/cn_proc.c   |  49 ++
 drivers/connector/connector.c | 171 --
 drivers/hv/hv_fcopy.c |   1 +
 include/linux/connector.h |  14 ++-
 include/net/net_namespace.h   |   4 +
 kernel/exit.c |   2 +-
 6 files changed, 190 insertions(+), 51 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 9202be177a30..661d921fd146 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -37,7 +38,6 @@ static inline struct cn_msg *buffer_to_cn_msg(__u8 *buffer)
return (struct cn_msg *)(buffer + 4);
 }
 
-static atomic_t proc_event_num_listeners = ATOMIC_INIT(0);
 static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC };
 
 /* local_event.count is used as the sequence number of the netlink message */
@@ -51,6 +51,9 @@ static DEFINE_PER_CPU(struct local_event, local_event) = {
 
 static inline void send_msg(struct cn_msg *msg)
 {
+   int ret = 0;
+   struct net *net = current->nsproxy->net_ns;
+
local_lock(_event.lock);
 
msg->seq = __this_cpu_inc_return(local_event.count) - 1;
@@ -62,7 +65,9 @@ static inline void send_msg(struct cn_msg *msg)
 *
 * If cn_netlink_send() fails, the data is not sent.
 */
-   cn_netlink_send(_net, msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+   ret = cn_netlink_send(net, msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+   if (ret == -ESRCH && netlink_has_listeners(net->cdev.nls, CN_IDX_PROC) 
== 0)
+   atomic_set(&(net->cdev.proc_event_num_listeners), 0);
 
local_unlock(_event.lock);
 }
@@ -73,8 +78,9 @@ void proc_fork_connector(struct task_struct *task)
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
struct task_struct *parent;
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -102,8 +108,9 @@ void proc_exec_connector(struct task_struct *task)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -127,8 +134,9 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
const struct cred *cred;
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -164,8 +172,9 @@ void proc_sid_connector(struct task_struct *task)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -188,8 +197,9 @@ void proc_ptrace_connector(struct task_struct *task, int 
ptrace_id)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -220,8 +230,9 @@ void proc_comm_connector(struct task_struct *task)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -246,8 +257,9 @@ void proc_coredump_connector(struct task_struct *task)
struct proc_event *ev;
struct task_struct *parent;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   struct net *net = current->nsproxy->net_ns;
 
-   if (atomic_read(_event_num_listeners) < 1)
+   if (atomic_read(&(net->cdev.proc_event

[PATCH 3/5] connector: Ensure callback entry is released

2020-07-01 Thread Matt Bennett
Currently the entry itself appears to be being leaked.

Signed-off-by: Matt Bennett 
---
 drivers/connector/cn_queue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index 49295052ba8b..a82ceeb37f26 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -132,8 +132,10 @@ void cn_queue_free_dev(struct cn_queue_dev *dev)
struct cn_callback_entry *cbq, *n;
 
spin_lock_bh(>queue_lock);
-   list_for_each_entry_safe(cbq, n, >queue_list, callback_entry)
+   list_for_each_entry_safe(cbq, n, >queue_list, callback_entry) {
list_del(>callback_entry);
+   cn_queue_release_callback(cbq);
+   }
spin_unlock_bh(>queue_lock);
 
while (atomic_read(>refcnt)) {
-- 
2.27.0



[PATCH 4/5] connector: Prepare for supporting multiple namespaces

2020-07-01 Thread Matt Bennett
Extend the existing function definitions / call sites to start
passing the network namespace. For now we still only pass the
default namespace.

Signed-off-by: Matt Bennett 
---
 Documentation/driver-api/connector.rst |  6 +++---
 drivers/connector/cn_proc.c|  5 +++--
 drivers/connector/cn_queue.c   |  5 +++--
 drivers/connector/connector.c  | 21 -
 drivers/hv/hv_utils_transport.c|  6 --
 drivers/md/dm-log-userspace-transfer.c |  6 --
 drivers/video/fbdev/uvesafb.c  |  8 +---
 drivers/w1/w1_netlink.c| 19 +++
 include/linux/connector.h  | 24 
 samples/connector/cn_test.c|  6 --
 10 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/Documentation/driver-api/connector.rst 
b/Documentation/driver-api/connector.rst
index c100c7482289..4fb1f73d76ad 100644
--- a/Documentation/driver-api/connector.rst
+++ b/Documentation/driver-api/connector.rst
@@ -25,9 +25,9 @@ handling, etc...  The Connector driver allows any kernelspace 
agents to use
 netlink based networking for inter-process communication in a significantly
 easier way::
 
-  int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct 
cn_msg *, struct netlink_skb_parms *));
-  void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__group, int gfp_mask);
-  void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int 
gfp_mask);
+  int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct 
net *, struct cn_msg *, struct netlink_skb_parms *));
+  void cn_netlink_send_multi(struct net *net, struct cn_msg *msg, u16 len, u32 
portid, u32 __group, int gfp_mask);
+  void cn_netlink_send(struct net *net, struct cn_msg *msg, u32 portid, u32 
__group, int gfp_mask);
 
   struct cb_id
   {
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index d90aea555a21..9202be177a30 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -61,7 +62,7 @@ static inline void send_msg(struct cn_msg *msg)
 *
 * If cn_netlink_send() fails, the data is not sent.
 */
-   cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+   cn_netlink_send(_net, msg, 0, CN_IDX_PROC, GFP_NOWAIT);
 
local_unlock(_event.lock);
 }
@@ -343,7 +344,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
  * cn_proc_mcast_ctl
  * @data: message sent from userspace via the connector
  */
-static void cn_proc_mcast_ctl(struct cn_msg *msg,
+static void cn_proc_mcast_ctl(struct net *net, struct cn_msg *msg,
  struct netlink_skb_parms *nsp)
 {
enum proc_cn_mcast_op *mc_op = NULL;
diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index a82ceeb37f26..22fdd2b149af 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -16,11 +16,12 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct cn_callback_entry *
 cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name,
  struct cb_id *id,
- void (*callback)(struct cn_msg *,
+ void (*callback)(struct net *, struct cn_msg *,
   struct netlink_skb_parms *))
 {
struct cn_callback_entry *cbq;
@@ -58,7 +59,7 @@ int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
  struct cb_id *id,
- void (*callback)(struct cn_msg *,
+ void (*callback)(struct net *, struct cn_msg *,
   struct netlink_skb_parms *))
 {
struct cn_callback_entry *cbq, *__cbq;
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 2d22d6bf52f2..82fcaa4d8be3 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -58,8 +58,8 @@ static int cn_already_initialized;
  * The message is sent to, the portid if given, the group if given, both if
  * both, or if both are zero then the group is looked up and sent there.
  */
-int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
-   gfp_t gfp_mask)
+int cn_netlink_send_mult(struct net *net, struct cn_msg *msg, u16 len,
+u32 portid, u32 __group, gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
unsigned int size;
@@ -118,17 +118,18 @@ int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 
portid, u32 __group,
 EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
 
 /* same as cn_netlink_send_mult except msg->len is used for len */
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
-   gfp_t gfp_mask)
+

[PATCH 2/5] connector: Use 'current_user_ns' function

2020-07-01 Thread Matt Bennett
In preparation for supporting the connector outside of the default
network namespace we switch to using this function now. As the connector
is still only supported in the default namespace this change is a no-op.

Signed-off-by: Matt Bennett 
---
 drivers/connector/cn_proc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 36a7823c56ec..d90aea555a21 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -139,11 +139,11 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
rcu_read_lock();
cred = __task_cred(task);
if (which_id == PROC_EVENT_UID) {
-   ev->event_data.id.r.ruid = from_kuid_munged(_user_ns, 
cred->uid);
-   ev->event_data.id.e.euid = from_kuid_munged(_user_ns, 
cred->euid);
+   ev->event_data.id.r.ruid = from_kuid_munged(current_user_ns(), 
cred->uid);
+   ev->event_data.id.e.euid = from_kuid_munged(current_user_ns(), 
cred->euid);
} else if (which_id == PROC_EVENT_GID) {
-   ev->event_data.id.r.rgid = from_kgid_munged(_user_ns, 
cred->gid);
-   ev->event_data.id.e.egid = from_kgid_munged(_user_ns, 
cred->egid);
+   ev->event_data.id.r.rgid = from_kgid_munged(current_user_ns(), 
cred->gid);
+   ev->event_data.id.e.egid = from_kgid_munged(current_user_ns(), 
cred->egid);
} else {
rcu_read_unlock();
return;
@@ -362,7 +362,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
return;
 
/* Can only change if privileged. */
-   if (!__netlink_ns_capable(nsp, _user_ns, CAP_NET_ADMIN)) {
+   if (!__netlink_ns_capable(nsp, current_user_ns(), CAP_NET_ADMIN)) {
err = EPERM;
goto out;
}
-- 
2.27.0



[PATCH 0/5] RFC: connector: Add network namespace awareness

2020-07-01 Thread Matt Bennett
Previously the connector functionality could only be used by processes running 
in the
default network namespace. This meant that any process that uses the connector 
functionality
could not operate correctly when run inside a container. This is a draft patch 
series that
attempts to now allow this functionality outside of the default network 
namespace.

I see this has been discussed previously [1], but am not sure how my changes 
relate to all
of the topics discussed there and/or if there are any unintended side effects 
from my draft
changes.

Thanks.

[1] https://marc.info/?l=linux-kernel=150806196728365=2

Matt Bennett (5):
  connector: Use task pid helpers
  connector: Use 'current_user_ns' function
  connector: Ensure callback entry is released
  connector: Prepare for supporting multiple namespaces
  connector: Create connector per namespace

 Documentation/driver-api/connector.rst |   6 +-
 drivers/connector/cn_proc.c| 110 +++---
 drivers/connector/cn_queue.c   |   9 +-
 drivers/connector/connector.c  | 192 -
 drivers/hv/hv_fcopy.c  |   1 +
 drivers/hv/hv_utils_transport.c|   6 +-
 drivers/md/dm-log-userspace-transfer.c |   6 +-
 drivers/video/fbdev/uvesafb.c  |   8 +-
 drivers/w1/w1_netlink.c|  19 +--
 include/linux/connector.h  |  38 +++--
 include/net/net_namespace.h|   4 +
 kernel/exit.c  |   2 +-
 samples/connector/cn_test.c|   6 +-
 13 files changed, 286 insertions(+), 121 deletions(-)

-- 
2.27.0



Re: [PATCH] mmc: core: Check for timeout before checking mmc device state

2015-05-11 Thread Matt Bennett
Hi Uffe,

We are using the Octeon mmc host driver supplied from the Cavium SDK (I
don't believe it is released to upstream linux). We have both a mmc
flash memory device and an SD card reader attached to the mmc bus. 

In the host driver code their is a mutex which must be obtained before
the driver can access the mmc bus. This stops the mmc flash and SD card
reader being written to in parallel (otherwise the signal on the bus
will be corrupted). It doesn't prevent parallel requests, it's just that
the second request will block on this mutex until the first request has
been completed.

In our specific case the following is occurring:

1. mmc_blk_part_switch() is called to switch partition on the mmc flash
device. This calls mmc_switch with a timeout_ms value of
'card->ext_csd.part_time' which is 10ms in this case.

2. In __mmc_switch() the command to switch partition is sent to the mmc
flash.

3. Between the command being sent to the flash and then the host polling
the status of the device (no busy detection hardware) a read or write
operation is begun on the SD card (in our case a Specification Version
2.00 card). In my testing I have seen the bus be blocked up to 800ms
while completing this operation.

4. The host polls the device for the status but blocks the first time on
the mutex for ~800ms while the SD card operation completes.

5. Finally the host gains the mutex and gets the status from the flash
device.

In my testing at this stage the status was never still
'R1_STATE_PRG' (it has been 800ms since the command was sent after all).
However the timeout check fails because it has been 800ms compared to
the original timeout_ms value passed in of 10ms. Therefore even though
the device has left the 'R1_STATE_PRG' state we return early with an
error that eventually gets printed to the log. This does not affect any
functionality as the host will simply try to switch the partition again
and if the bus does not block again then there are no issues.

By putting the timeout check before we read the status of the device
(and potentially block for longer than the timeout) we don't return an
early error if the device has indeed left the programming state. We
might as well continue through the function as after we return the error
the host is just going to issue the command again. 

Please excuse me if I have missed something fundamental.

Thanks,
Matt



On Mon, 2015-05-11 at 12:00 +0200, Ulf Hansson wrote:
> On 8 May 2015 at 04:40, Matt Bennett  wrote:
> > On a system that has multiple devices on the mmc bus the host can
> > block on the mutex that protects access to the bus. Some operations
> 
> What mutex are you referring to?
> 
> And why, exactly, does it prevent parallel requests on different cards
> (devices)?
> 
> Kind regards
> Uffe
> 
> > require the status of the device to be polled to see when the device
> > finishes executing the previous command that was sent to it (if
> > there is no busy detection in hardware). The current execution order
> > to check the status is:
> >
> > LOOP
> > {
> >   1. Send command to device to retrieve the status (this can block).
> >   2. Check we haven't exceeded the timeout value. If we have then
> >  return an error.
> >   3. If the device is no longer in the program state then exit the
> >  loop and continue through the function.
> > }
> >
> > If the send command blocks (and the timeout is exceeded) then the
> > function returns (and prints) an error even though the device has
> > likely left the programming state (due to the lengthy period of
> > time while the bus is blocked). By moving the timeout check before
> > retrieving the device status in the loop we better handle the case
> > where the mmc bus has been blocked but the device has left the
> > programming state.
> >
> > Signed-off-by: Matt Bennett 
> > Cc: kuninori.morimoto...@renesas.com
> > Cc: jh80.ch...@samsung.com
> > Cc: sb...@codeaurora.org
> > Cc: johan.rudh...@axis.com
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-...@vger.kernel.org
> > ---
> >  drivers/mmc/card/block.c   | 22 +++---
> >  drivers/mmc/core/core.c| 20 ++--
> >  drivers/mmc/core/mmc_ops.c | 14 +++---
> >  3 files changed, 28 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> > index 2c25271..0abefde 100644
> > --- a/drivers/mmc/card/block.c
> > +++ b/drivers/mmc/card/block.c
> > @@ -747,6 +747,17 @@ static int card_busy_detect(struct mmc_card *card, 
> > unsigned int timeout_ms,
> > u32 status;
> >
> > do {
> > +   /*
> > +* Timeout if the device

Re: [PATCH] mmc: core: Check for timeout before checking mmc device state

2015-05-11 Thread Matt Bennett
Hi Uffe,

We are using the Octeon mmc host driver supplied from the Cavium SDK (I
don't believe it is released to upstream linux). We have both a mmc
flash memory device and an SD card reader attached to the mmc bus. 

In the host driver code their is a mutex which must be obtained before
the driver can access the mmc bus. This stops the mmc flash and SD card
reader being written to in parallel (otherwise the signal on the bus
will be corrupted). It doesn't prevent parallel requests, it's just that
the second request will block on this mutex until the first request has
been completed.

In our specific case the following is occurring:

1. mmc_blk_part_switch() is called to switch partition on the mmc flash
device. This calls mmc_switch with a timeout_ms value of
'card-ext_csd.part_time' which is 10ms in this case.

2. In __mmc_switch() the command to switch partition is sent to the mmc
flash.

3. Between the command being sent to the flash and then the host polling
the status of the device (no busy detection hardware) a read or write
operation is begun on the SD card (in our case a Specification Version
2.00 card). In my testing I have seen the bus be blocked up to 800ms
while completing this operation.

4. The host polls the device for the status but blocks the first time on
the mutex for ~800ms while the SD card operation completes.

5. Finally the host gains the mutex and gets the status from the flash
device.

In my testing at this stage the status was never still
'R1_STATE_PRG' (it has been 800ms since the command was sent after all).
However the timeout check fails because it has been 800ms compared to
the original timeout_ms value passed in of 10ms. Therefore even though
the device has left the 'R1_STATE_PRG' state we return early with an
error that eventually gets printed to the log. This does not affect any
functionality as the host will simply try to switch the partition again
and if the bus does not block again then there are no issues.

By putting the timeout check before we read the status of the device
(and potentially block for longer than the timeout) we don't return an
early error if the device has indeed left the programming state. We
might as well continue through the function as after we return the error
the host is just going to issue the command again. 

Please excuse me if I have missed something fundamental.

Thanks,
Matt



On Mon, 2015-05-11 at 12:00 +0200, Ulf Hansson wrote:
 On 8 May 2015 at 04:40, Matt Bennett matt.benn...@alliedtelesis.co.nz wrote:
  On a system that has multiple devices on the mmc bus the host can
  block on the mutex that protects access to the bus. Some operations
 
 What mutex are you referring to?
 
 And why, exactly, does it prevent parallel requests on different cards
 (devices)?
 
 Kind regards
 Uffe
 
  require the status of the device to be polled to see when the device
  finishes executing the previous command that was sent to it (if
  there is no busy detection in hardware). The current execution order
  to check the status is:
 
  LOOP
  {
1. Send command to device to retrieve the status (this can block).
2. Check we haven't exceeded the timeout value. If we have then
   return an error.
3. If the device is no longer in the program state then exit the
   loop and continue through the function.
  }
 
  If the send command blocks (and the timeout is exceeded) then the
  function returns (and prints) an error even though the device has
  likely left the programming state (due to the lengthy period of
  time while the bus is blocked). By moving the timeout check before
  retrieving the device status in the loop we better handle the case
  where the mmc bus has been blocked but the device has left the
  programming state.
 
  Signed-off-by: Matt Bennett matt.benn...@alliedtelesis.co.nz
  Cc: kuninori.morimoto...@renesas.com
  Cc: jh80.ch...@samsung.com
  Cc: sb...@codeaurora.org
  Cc: johan.rudh...@axis.com
  Cc: linux-kernel@vger.kernel.org
  Cc: linux-...@vger.kernel.org
  ---
   drivers/mmc/card/block.c   | 22 +++---
   drivers/mmc/core/core.c| 20 ++--
   drivers/mmc/core/mmc_ops.c | 14 +++---
   3 files changed, 28 insertions(+), 28 deletions(-)
 
  diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
  index 2c25271..0abefde 100644
  --- a/drivers/mmc/card/block.c
  +++ b/drivers/mmc/card/block.c
  @@ -747,6 +747,17 @@ static int card_busy_detect(struct mmc_card *card, 
  unsigned int timeout_ms,
  u32 status;
 
  do {
  +   /*
  +* Timeout if the device never becomes ready for data and 
  never
  +* leaves the program state.
  +*/
  +   if (time_after(jiffies, timeout)) {
  +   pr_err(%s: Card stuck in programming state! %s 
  %s\n,
  +   mmc_hostname(card-host),
  +   req-rq_disk-disk_name, __func__

[PATCH] mmc: core: Check for timeout before checking mmc device state

2015-05-07 Thread Matt Bennett
On a system that has multiple devices on the mmc bus the host can
block on the mutex that protects access to the bus. Some operations
require the status of the device to be polled to see when the device
finishes executing the previous command that was sent to it (if
there is no busy detection in hardware). The current execution order
to check the status is:

LOOP
{
  1. Send command to device to retrieve the status (this can block).
  2. Check we haven't exceeded the timeout value. If we have then
 return an error.
  3. If the device is no longer in the program state then exit the
 loop and continue through the function.
}

If the send command blocks (and the timeout is exceeded) then the
function returns (and prints) an error even though the device has
likely left the programming state (due to the lengthy period of
time while the bus is blocked). By moving the timeout check before
retrieving the device status in the loop we better handle the case
where the mmc bus has been blocked but the device has left the
programming state.

Signed-off-by: Matt Bennett 
Cc: kuninori.morimoto...@renesas.com
Cc: jh80.ch...@samsung.com
Cc: sb...@codeaurora.org
Cc: johan.rudh...@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 drivers/mmc/card/block.c   | 22 +++---
 drivers/mmc/core/core.c| 20 ++--
 drivers/mmc/core/mmc_ops.c | 14 +++---
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 2c25271..0abefde 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -747,6 +747,17 @@ static int card_busy_detect(struct mmc_card *card, 
unsigned int timeout_ms,
u32 status;
 
do {
+   /*
+* Timeout if the device never becomes ready for data and never
+* leaves the program state.
+*/
+   if (time_after(jiffies, timeout)) {
+   pr_err("%s: Card stuck in programming state! %s %s\n",
+   mmc_hostname(card->host),
+   req->rq_disk->disk_name, __func__);
+   return -ETIMEDOUT;
+   }
+
err = get_card_status(card, , 5);
if (err) {
pr_err("%s: error %d requesting status\n",
@@ -766,17 +777,6 @@ static int card_busy_detect(struct mmc_card *card, 
unsigned int timeout_ms,
break;
 
/*
-* Timeout if the device never becomes ready for data and never
-* leaves the program state.
-*/
-   if (time_after(jiffies, timeout)) {
-   pr_err("%s: Card stuck in programming state! %s %s\n",
-   mmc_hostname(card->host),
-   req->rq_disk->disk_name, __func__);
-   return -ETIMEDOUT;
-   }
-
-   /*
 * Some cards mishandle the status bits,
 * so make sure to check both the busy
 * indication and the card state.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index c296bc0..6e56cb3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2047,6 +2047,16 @@ static int mmc_do_erase(struct mmc_card *card, unsigned 
int from,
 
timeout = jiffies + msecs_to_jiffies(MMC_CORE_TIMEOUT_MS);
do {
+   /* Timeout if the device never becomes ready for data and
+* never leaves the program state.
+*/
+   if (time_after(jiffies, timeout)) {
+   pr_err("%s: Card stuck in programming state! %s\n",
+   mmc_hostname(card->host), __func__);
+   err =  -EIO;
+   goto out;
+   }
+
memset(, 0, sizeof(struct mmc_command));
cmd.opcode = MMC_SEND_STATUS;
cmd.arg = card->rca << 16;
@@ -2060,16 +2070,6 @@ static int mmc_do_erase(struct mmc_card *card, unsigned 
int from,
goto out;
}
 
-   /* Timeout if the device never becomes ready for data and
-* never leaves the program state.
-*/
-   if (time_after(jiffies, timeout)) {
-   pr_err("%s: Card stuck in programming state! %s\n",
-   mmc_hostname(card->host), __func__);
-   err =  -EIO;
-   goto out;
-   }
-
} while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
 (R1_CURRENT_STATE(cmd.resp[0]) == R1_STATE_PRG));
 out:
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 0ea042d..b30ed91 100644
---

[PATCH] mmc: core: Check for timeout before checking mmc device state

2015-05-07 Thread Matt Bennett
On a system that has multiple devices on the mmc bus the host can
block on the mutex that protects access to the bus. Some operations
require the status of the device to be polled to see when the device
finishes executing the previous command that was sent to it (if
there is no busy detection in hardware). The current execution order
to check the status is:

LOOP
{
  1. Send command to device to retrieve the status (this can block).
  2. Check we haven't exceeded the timeout value. If we have then
 return an error.
  3. If the device is no longer in the program state then exit the
 loop and continue through the function.
}

If the send command blocks (and the timeout is exceeded) then the
function returns (and prints) an error even though the device has
likely left the programming state (due to the lengthy period of
time while the bus is blocked). By moving the timeout check before
retrieving the device status in the loop we better handle the case
where the mmc bus has been blocked but the device has left the
programming state.

Signed-off-by: Matt Bennett matt.benn...@alliedtelesis.co.nz
Cc: kuninori.morimoto...@renesas.com
Cc: jh80.ch...@samsung.com
Cc: sb...@codeaurora.org
Cc: johan.rudh...@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 drivers/mmc/card/block.c   | 22 +++---
 drivers/mmc/core/core.c| 20 ++--
 drivers/mmc/core/mmc_ops.c | 14 +++---
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 2c25271..0abefde 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -747,6 +747,17 @@ static int card_busy_detect(struct mmc_card *card, 
unsigned int timeout_ms,
u32 status;
 
do {
+   /*
+* Timeout if the device never becomes ready for data and never
+* leaves the program state.
+*/
+   if (time_after(jiffies, timeout)) {
+   pr_err(%s: Card stuck in programming state! %s %s\n,
+   mmc_hostname(card-host),
+   req-rq_disk-disk_name, __func__);
+   return -ETIMEDOUT;
+   }
+
err = get_card_status(card, status, 5);
if (err) {
pr_err(%s: error %d requesting status\n,
@@ -766,17 +777,6 @@ static int card_busy_detect(struct mmc_card *card, 
unsigned int timeout_ms,
break;
 
/*
-* Timeout if the device never becomes ready for data and never
-* leaves the program state.
-*/
-   if (time_after(jiffies, timeout)) {
-   pr_err(%s: Card stuck in programming state! %s %s\n,
-   mmc_hostname(card-host),
-   req-rq_disk-disk_name, __func__);
-   return -ETIMEDOUT;
-   }
-
-   /*
 * Some cards mishandle the status bits,
 * so make sure to check both the busy
 * indication and the card state.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index c296bc0..6e56cb3 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2047,6 +2047,16 @@ static int mmc_do_erase(struct mmc_card *card, unsigned 
int from,
 
timeout = jiffies + msecs_to_jiffies(MMC_CORE_TIMEOUT_MS);
do {
+   /* Timeout if the device never becomes ready for data and
+* never leaves the program state.
+*/
+   if (time_after(jiffies, timeout)) {
+   pr_err(%s: Card stuck in programming state! %s\n,
+   mmc_hostname(card-host), __func__);
+   err =  -EIO;
+   goto out;
+   }
+
memset(cmd, 0, sizeof(struct mmc_command));
cmd.opcode = MMC_SEND_STATUS;
cmd.arg = card-rca  16;
@@ -2060,16 +2070,6 @@ static int mmc_do_erase(struct mmc_card *card, unsigned 
int from,
goto out;
}
 
-   /* Timeout if the device never becomes ready for data and
-* never leaves the program state.
-*/
-   if (time_after(jiffies, timeout)) {
-   pr_err(%s: Card stuck in programming state! %s\n,
-   mmc_hostname(card-host), __func__);
-   err =  -EIO;
-   goto out;
-   }
-
} while (!(cmd.resp[0]  R1_READY_FOR_DATA) ||
 (R1_CURRENT_STATE(cmd.resp[0]) == R1_STATE_PRG));
 out:
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 0ea042d..b30ed91 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -526,6

Re: [PATCH] MIPS: OCTEON: Stop .bss memory being allocated to the kernel

2015-05-04 Thread Matt Bennett
I mistakenly only included David in the original patch email so I have
added the other people from get_maintainer.pl. I believe this patch
could be fixing a rather significant problem (a potential kernel panic)
so some feedback would be appreciated.

Thanks,
Matt 

On Thu, 2015-04-30 at 11:48 +1200, Matt Bennett wrote:
> During development work on a 3.16 based kernel it was found that a
> number of builds would panic during the kernel init process, more
> specifically in 'delayed_fput()'. The panic showed the kernel trying
> to access a memory address of '0xb7fdc00' while traversing the
> 'delayed_fput_list' structure. Comparing this memory address to the
> value of the pointer used on builds that did not panic confirmed
> that the pointer on crashing builds must have been corrupted at some
> stage earlier in the init process.
> 
> By traversing the list earlier and earlier in the code it was found
> that 'plat_mem_setup()' was responsible for corrupting the list.
> Specifically the line:
> 
> memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
>   __pa_symbol(&__init_end), -1,
>   0x10,
>   CVMX_BOOTMEM_FLAG_NO_LOCKING);
> 
> Which would eventually call:
> 
> cvmx_bootmem_phy_set_size(new_ent_addr,
>   cvmx_bootmem_phy_get_size
>   (ent_addr) -
>   (desired_min_addr -
>   ent_addr));
> 
> Where 'new_ent_addr'=0x480 (the address of 'delayed_fput_list')
> and the second argument (size)=0xb7fdc00 (the address causing the
> kernel panic). The job of this part of 'plat_mem_setup()' is to
> allocate chunks of memory for the kernel to use. At the start of
> each chunk of memory the size of the chunk is written, hence the
> value 0xb7fdc00 is written onto memory at 0x480, therefore the
> kernel panics when it goes back to access 'delayed_fput_list' later
> on in the initialisation process.
> 
> On builds that were not crashing it was found that the compiler had
> placed 'delayed_fput_list' at 0x488, meaning it wasn't corrupted
> (but something else in memory was overwritten).
> 
> As can be seen in the first function call above the code begins to
> allocate chunks of memory beginning from the symbol '__init_end'.
> The MIPS linker script (vmlinux.lds.S) however defines the .bss
> section to begin after '__init_end'. Therefore memory within the
> .bss section is allocated to the kernel to use (System.map shows
> 'delayed_fput_list' and other kernel structures to be in .bss).
> 
> To stop the kernel panic (and the .bss section being corrupted)
> memory should begin being allocated from the symbol '_end'.
> 
> Signed-off-by: Matt Bennett 
> Cc: linux-m...@linux-mips.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/mips/cavium-octeon/setup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
> index 7e4367b..f632f14 100644
> --- a/arch/mips/cavium-octeon/setup.c
> +++ b/arch/mips/cavium-octeon/setup.c
> @@ -1008,7 +1008,7 @@ void __init plat_mem_setup(void)
>   while ((boot_mem_map.nr_map < BOOT_MEM_MAP_MAX)
>   && (total < MAX_MEMORY)) {
>   memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
> - __pa_symbol(&__init_end), -1,
> + __pa_symbol(&_end), -1,
>   0x10,
>   CVMX_BOOTMEM_FLAG_NO_LOCKING);
>   if (memory >= 0) {



Re: [PATCH] MIPS: OCTEON: Stop .bss memory being allocated to the kernel

2015-05-04 Thread Matt Bennett
I mistakenly only included David in the original patch email so I have
added the other people from get_maintainer.pl. I believe this patch
could be fixing a rather significant problem (a potential kernel panic)
so some feedback would be appreciated.

Thanks,
Matt 

On Thu, 2015-04-30 at 11:48 +1200, Matt Bennett wrote:
 During development work on a 3.16 based kernel it was found that a
 number of builds would panic during the kernel init process, more
 specifically in 'delayed_fput()'. The panic showed the kernel trying
 to access a memory address of '0xb7fdc00' while traversing the
 'delayed_fput_list' structure. Comparing this memory address to the
 value of the pointer used on builds that did not panic confirmed
 that the pointer on crashing builds must have been corrupted at some
 stage earlier in the init process.
 
 By traversing the list earlier and earlier in the code it was found
 that 'plat_mem_setup()' was responsible for corrupting the list.
 Specifically the line:
 
 memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
   __pa_symbol(__init_end), -1,
   0x10,
   CVMX_BOOTMEM_FLAG_NO_LOCKING);
 
 Which would eventually call:
 
 cvmx_bootmem_phy_set_size(new_ent_addr,
   cvmx_bootmem_phy_get_size
   (ent_addr) -
   (desired_min_addr -
   ent_addr));
 
 Where 'new_ent_addr'=0x480 (the address of 'delayed_fput_list')
 and the second argument (size)=0xb7fdc00 (the address causing the
 kernel panic). The job of this part of 'plat_mem_setup()' is to
 allocate chunks of memory for the kernel to use. At the start of
 each chunk of memory the size of the chunk is written, hence the
 value 0xb7fdc00 is written onto memory at 0x480, therefore the
 kernel panics when it goes back to access 'delayed_fput_list' later
 on in the initialisation process.
 
 On builds that were not crashing it was found that the compiler had
 placed 'delayed_fput_list' at 0x488, meaning it wasn't corrupted
 (but something else in memory was overwritten).
 
 As can be seen in the first function call above the code begins to
 allocate chunks of memory beginning from the symbol '__init_end'.
 The MIPS linker script (vmlinux.lds.S) however defines the .bss
 section to begin after '__init_end'. Therefore memory within the
 .bss section is allocated to the kernel to use (System.map shows
 'delayed_fput_list' and other kernel structures to be in .bss).
 
 To stop the kernel panic (and the .bss section being corrupted)
 memory should begin being allocated from the symbol '_end'.
 
 Signed-off-by: Matt Bennett matt.benn...@alliedtelesis.co.nz
 Cc: linux-m...@linux-mips.org
 Cc: linux-kernel@vger.kernel.org
 ---
  arch/mips/cavium-octeon/setup.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
 index 7e4367b..f632f14 100644
 --- a/arch/mips/cavium-octeon/setup.c
 +++ b/arch/mips/cavium-octeon/setup.c
 @@ -1008,7 +1008,7 @@ void __init plat_mem_setup(void)
   while ((boot_mem_map.nr_map  BOOT_MEM_MAP_MAX)
(total  MAX_MEMORY)) {
   memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
 - __pa_symbol(__init_end), -1,
 + __pa_symbol(_end), -1,
   0x10,
   CVMX_BOOTMEM_FLAG_NO_LOCKING);
   if (memory = 0) {



[PATCH] MIPS: OCTEON: Stop .bss memory being allocated to the kernel

2015-04-29 Thread Matt Bennett
During development work on a 3.16 based kernel it was found that a
number of builds would panic during the kernel init process, more
specifically in 'delayed_fput()'. The panic showed the kernel trying
to access a memory address of '0xb7fdc00' while traversing the
'delayed_fput_list' structure. Comparing this memory address to the
value of the pointer used on builds that did not panic confirmed
that the pointer on crashing builds must have been corrupted at some
stage earlier in the init process.

By traversing the list earlier and earlier in the code it was found
that 'plat_mem_setup()' was responsible for corrupting the list.
Specifically the line:

memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
__pa_symbol(&__init_end), -1,
0x10,
CVMX_BOOTMEM_FLAG_NO_LOCKING);

Which would eventually call:

cvmx_bootmem_phy_set_size(new_ent_addr,
cvmx_bootmem_phy_get_size
(ent_addr) -
(desired_min_addr -
ent_addr));

Where 'new_ent_addr'=0x480 (the address of 'delayed_fput_list')
and the second argument (size)=0xb7fdc00 (the address causing the
kernel panic). The job of this part of 'plat_mem_setup()' is to
allocate chunks of memory for the kernel to use. At the start of
each chunk of memory the size of the chunk is written, hence the
value 0xb7fdc00 is written onto memory at 0x480, therefore the
kernel panics when it goes back to access 'delayed_fput_list' later
on in the initialisation process.

On builds that were not crashing it was found that the compiler had
placed 'delayed_fput_list' at 0x488, meaning it wasn't corrupted
(but something else in memory was overwritten).

As can be seen in the first function call above the code begins to
allocate chunks of memory beginning from the symbol '__init_end'.
The MIPS linker script (vmlinux.lds.S) however defines the .bss
section to begin after '__init_end'. Therefore memory within the
.bss section is allocated to the kernel to use (System.map shows
'delayed_fput_list' and other kernel structures to be in .bss).

To stop the kernel panic (and the .bss section being corrupted)
memory should begin being allocated from the symbol '_end'.

Signed-off-by: Matt Bennett 
Cc: linux-m...@linux-mips.org
Cc: linux-kernel@vger.kernel.org
---
 arch/mips/cavium-octeon/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
index 7e4367b..f632f14 100644
--- a/arch/mips/cavium-octeon/setup.c
+++ b/arch/mips/cavium-octeon/setup.c
@@ -1008,7 +1008,7 @@ void __init plat_mem_setup(void)
while ((boot_mem_map.nr_map < BOOT_MEM_MAP_MAX)
&& (total < MAX_MEMORY)) {
memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
-   __pa_symbol(&__init_end), -1,
+   __pa_symbol(&_end), -1,
0x10,
CVMX_BOOTMEM_FLAG_NO_LOCKING);
if (memory >= 0) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MIPS: OCTEON: Stop .bss memory being allocated to the kernel

2015-04-29 Thread Matt Bennett
During development work on a 3.16 based kernel it was found that a
number of builds would panic during the kernel init process, more
specifically in 'delayed_fput()'. The panic showed the kernel trying
to access a memory address of '0xb7fdc00' while traversing the
'delayed_fput_list' structure. Comparing this memory address to the
value of the pointer used on builds that did not panic confirmed
that the pointer on crashing builds must have been corrupted at some
stage earlier in the init process.

By traversing the list earlier and earlier in the code it was found
that 'plat_mem_setup()' was responsible for corrupting the list.
Specifically the line:

memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
__pa_symbol(__init_end), -1,
0x10,
CVMX_BOOTMEM_FLAG_NO_LOCKING);

Which would eventually call:

cvmx_bootmem_phy_set_size(new_ent_addr,
cvmx_bootmem_phy_get_size
(ent_addr) -
(desired_min_addr -
ent_addr));

Where 'new_ent_addr'=0x480 (the address of 'delayed_fput_list')
and the second argument (size)=0xb7fdc00 (the address causing the
kernel panic). The job of this part of 'plat_mem_setup()' is to
allocate chunks of memory for the kernel to use. At the start of
each chunk of memory the size of the chunk is written, hence the
value 0xb7fdc00 is written onto memory at 0x480, therefore the
kernel panics when it goes back to access 'delayed_fput_list' later
on in the initialisation process.

On builds that were not crashing it was found that the compiler had
placed 'delayed_fput_list' at 0x488, meaning it wasn't corrupted
(but something else in memory was overwritten).

As can be seen in the first function call above the code begins to
allocate chunks of memory beginning from the symbol '__init_end'.
The MIPS linker script (vmlinux.lds.S) however defines the .bss
section to begin after '__init_end'. Therefore memory within the
.bss section is allocated to the kernel to use (System.map shows
'delayed_fput_list' and other kernel structures to be in .bss).

To stop the kernel panic (and the .bss section being corrupted)
memory should begin being allocated from the symbol '_end'.

Signed-off-by: Matt Bennett matt.benn...@alliedtelesis.co.nz
Cc: linux-m...@linux-mips.org
Cc: linux-kernel@vger.kernel.org
---
 arch/mips/cavium-octeon/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
index 7e4367b..f632f14 100644
--- a/arch/mips/cavium-octeon/setup.c
+++ b/arch/mips/cavium-octeon/setup.c
@@ -1008,7 +1008,7 @@ void __init plat_mem_setup(void)
while ((boot_mem_map.nr_map  BOOT_MEM_MAP_MAX)
 (total  MAX_MEMORY)) {
memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
-   __pa_symbol(__init_end), -1,
+   __pa_symbol(_end), -1,
0x10,
CVMX_BOOTMEM_FLAG_NO_LOCKING);
if (memory = 0) {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/