Re: OF_DYNAMIC node lifecycle

2014-07-16 Thread Tyrel Datwyler
On 07/15/2014 10:33 PM, Grant Likely wrote:
 I've got another question about powerpc reconfiguration. I was looking
 at the dlpar_configure_connector() function in dlpar.c. I see that the
 function has the ability to process multiple nodes with additional
 sibling and child nodes. It appears to link them into a detached tree
 structure, and the function returns a pointer to the first node.
 
 All of the callers of that function then call dlpar_attach_node(),
 which calls of_attach_node(). However, of_attach_node() only handles a
 single node. It doesn't handle siblings or children. Is this a bug?
 Does the configure connector ever actually receive more than one node
 at once?

Yes, it is sometimes the case we will get a tree structure back of more
than one node. Under the proc interface implementation this just worked.
With the move to sysfs it appears we have a regression here. What makes
more sense here, for us to walk the tree calling of_attach_node, or to
move such tree walking logic into of_attach_node? From what I can tell
we are the only consumers of of_attach_node.

-Tyrel

 
 g.
 
 On Fri, Jun 27, 2014 at 8:41 AM, Nathan Fontenot nf...@austin.ibm.com wrote:
 On 06/27/2014 07:41 AM, Grant Likely wrote:
 On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/25/2014 03:24 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot 
 nf...@austin.ibm.com wrote:
 heh! I have often thought about adding reference counting to device 
 tree
 properties.

 You horrible, horrible man.

 Yes. I are evil :)

 After looking again the work needed to add reference counts to properties
 would be huge. The few properties I am concerned with are specific to 
 powerpc
 so perhaps just adding an arch specific lock around updating those
 properties would work.

 Which code/properties? I'd like to have a look myself.

 /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory

 The property is updated in
 arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

 Specifically, what do you need for the locking? Are you wanting to hold
 off additional changes while that function is executing? Pantelis is
 adding a mutex for device tree writers. Holding that mutex would prevent
 any changes from happening in the tree without affecting readers. Would
 that be sufficient?

 That would work.

 -Nathan

 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-07-16 Thread Grant Likely
On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
turtle.in.the.ker...@gmail.com wrote:
 On 07/15/2014 10:33 PM, Grant Likely wrote:
 I've got another question about powerpc reconfiguration. I was looking
 at the dlpar_configure_connector() function in dlpar.c. I see that the
 function has the ability to process multiple nodes with additional
 sibling and child nodes. It appears to link them into a detached tree
 structure, and the function returns a pointer to the first node.

 All of the callers of that function then call dlpar_attach_node(),
 which calls of_attach_node(). However, of_attach_node() only handles a
 single node. It doesn't handle siblings or children. Is this a bug?
 Does the configure connector ever actually receive more than one node
 at once?

 Yes, it is sometimes the case we will get a tree structure back of more
 than one node. Under the proc interface implementation this just worked.
 With the move to sysfs it appears we have a regression here. What makes
 more sense here, for us to walk the tree calling of_attach_node, or to
 move such tree walking logic into of_attach_node? From what I can tell
 we are the only consumers of of_attach_node.

That is very shortly going to change. The overlay code also uses
of_attach_node(). I can make of_attach_node() recurse over
descendants, but I'm also considering moving the powerpc code over to
the of_changeset series that Panto and I are working on.

Either way, the handling of multiple nodes should be common code. I
think the easiest is to put the recursion into of_attach_node(), at
least for fixing the bug. It can be reworked later.

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-07-16 Thread Grant Likely
On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely grant.lik...@linaro.org wrote:
 On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
 turtle.in.the.ker...@gmail.com wrote:
 On 07/15/2014 10:33 PM, Grant Likely wrote:
 I've got another question about powerpc reconfiguration. I was looking
 at the dlpar_configure_connector() function in dlpar.c. I see that the
 function has the ability to process multiple nodes with additional
 sibling and child nodes. It appears to link them into a detached tree
 structure, and the function returns a pointer to the first node.

 All of the callers of that function then call dlpar_attach_node(),
 which calls of_attach_node(). However, of_attach_node() only handles a
 single node. It doesn't handle siblings or children. Is this a bug?
 Does the configure connector ever actually receive more than one node
 at once?

 Yes, it is sometimes the case we will get a tree structure back of more
 than one node. Under the proc interface implementation this just worked.
 With the move to sysfs it appears we have a regression here. What makes
 more sense here, for us to walk the tree calling of_attach_node, or to
 move such tree walking logic into of_attach_node? From what I can tell
 we are the only consumers of of_attach_node.

 That is very shortly going to change. The overlay code also uses
 of_attach_node(). I can make of_attach_node() recurse over
 descendants, but I'm also considering moving the powerpc code over to
 the of_changeset series that Panto and I are working on.

 Either way, the handling of multiple nodes should be common code. I
 think the easiest is to put the recursion into of_attach_node(), at
 least for fixing the bug. It can be reworked later.

On pseries, do notifiers ever fail? ie. Does the reconfig code ever
object to a DT change and prevent it from being applied?

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-07-16 Thread Nathan Fontenot
On 07/16/2014 05:26 PM, Grant Likely wrote:
 On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely grant.lik...@linaro.org wrote:
 On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
 turtle.in.the.ker...@gmail.com wrote:
 On 07/15/2014 10:33 PM, Grant Likely wrote:
 I've got another question about powerpc reconfiguration. I was looking
 at the dlpar_configure_connector() function in dlpar.c. I see that the
 function has the ability to process multiple nodes with additional
 sibling and child nodes. It appears to link them into a detached tree
 structure, and the function returns a pointer to the first node.

 All of the callers of that function then call dlpar_attach_node(),
 which calls of_attach_node(). However, of_attach_node() only handles a
 single node. It doesn't handle siblings or children. Is this a bug?
 Does the configure connector ever actually receive more than one node
 at once?

 Yes, it is sometimes the case we will get a tree structure back of more
 than one node. Under the proc interface implementation this just worked.
 With the move to sysfs it appears we have a regression here. What makes
 more sense here, for us to walk the tree calling of_attach_node, or to
 move such tree walking logic into of_attach_node? From what I can tell
 we are the only consumers of of_attach_node.

 That is very shortly going to change. The overlay code also uses
 of_attach_node(). I can make of_attach_node() recurse over
 descendants, but I'm also considering moving the powerpc code over to
 the of_changeset series that Panto and I are working on.

 Either way, the handling of multiple nodes should be common code. I
 think the easiest is to put the recursion into of_attach_node(), at
 least for fixing the bug. It can be reworked later.
 
 On pseries, do notifiers ever fail? ie. Does the reconfig code ever
 object to a DT change and prevent it from being applied?
 
I cannot think of a time that  I ever saw a notifier fail.

-Nathan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-07-16 Thread Grant Likely
On Wed, Jul 16, 2014 at 5:12 PM, Nathan Fontenot nf...@austin.ibm.com wrote:
 On 07/16/2014 05:26 PM, Grant Likely wrote:
 On Wed, Jul 16, 2014 at 2:57 PM, Grant Likely grant.lik...@linaro.org 
 wrote:
 On Wed, Jul 16, 2014 at 12:30 PM, Tyrel Datwyler
 turtle.in.the.ker...@gmail.com wrote:
 On 07/15/2014 10:33 PM, Grant Likely wrote:
 I've got another question about powerpc reconfiguration. I was looking
 at the dlpar_configure_connector() function in dlpar.c. I see that the
 function has the ability to process multiple nodes with additional
 sibling and child nodes. It appears to link them into a detached tree
 structure, and the function returns a pointer to the first node.

 All of the callers of that function then call dlpar_attach_node(),
 which calls of_attach_node(). However, of_attach_node() only handles a
 single node. It doesn't handle siblings or children. Is this a bug?
 Does the configure connector ever actually receive more than one node
 at once?

 Yes, it is sometimes the case we will get a tree structure back of more
 than one node. Under the proc interface implementation this just worked.
 With the move to sysfs it appears we have a regression here. What makes
 more sense here, for us to walk the tree calling of_attach_node, or to
 move such tree walking logic into of_attach_node? From what I can tell
 we are the only consumers of of_attach_node.

 That is very shortly going to change. The overlay code also uses
 of_attach_node(). I can make of_attach_node() recurse over
 descendants, but I'm also considering moving the powerpc code over to
 the of_changeset series that Panto and I are working on.

 Either way, the handling of multiple nodes should be common code. I
 think the easiest is to put the recursion into of_attach_node(), at
 least for fixing the bug. It can be reworked later.

 On pseries, do notifiers ever fail? ie. Does the reconfig code ever
 object to a DT change and prevent it from being applied?

 I cannot think of a time that  I ever saw a notifier fail.

Good to know. I was hoping it wasn't part of the design. I can't think
of any situation where the kernel would want to inhibit a change to
the device tree.

Can you take a look at the following tree and give it a spin on
PowerPC. I've reordered the notifiers and hopefully got the powerpc
fixups right, but I don't have a way to test it...

The following changes since commit cd3de83f147601356395b57a8673e9c5ff1e59d1:

  Linux 3.16-rc4 (2014-07-06 12:37:51 -0700)

are available in the git repository at:

  git://git.secretlab.ca/git/linux devicetree/next-overlay

for you to fetch changes up to 44ac93bb5583c5f1f85912309fe04045b61a6dd0:

  of: Transactional DT support. (2014-07-16 16:44:07 -0600)


Grant Likely (6):
  of/platform: Fix of_platform_device_destroy iteration of devices
  of: Move CONFIG_OF_DYNAMIC code into a separate file
  of: Make sure attached nodes don't carry along extra children
  of: Move dynamic node fixups out of powerpc and into common code
  of: Make OF_DYNAMIC user selectable
  of: Reorder device tree changes and notifiers

Pantelis Antoniou (5):
  of: rename of_aliases_mutex to just of_mutex
  OF: Utility helper functions for dynamic nodes
  of: Create unlocked versions of node and property add/remove functions
  of: Make devicetree sysfs update functions consistent.
  of: Transactional DT support.

 Documentation/devicetree/changesets.txt |  41 ++
 arch/powerpc/kernel/prom.c  |  70 ---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   2 +-
 drivers/crypto/nx/nx-842.c  |  30 +-
 drivers/of/Kconfig  |   2 +-
 drivers/of/Makefile |   1 +
 drivers/of/base.c   | 423 ---
 drivers/of/device.c |   4 +-
 drivers/of/dynamic.c| 671 
 drivers/of/of_private.h |  59 ++-
 drivers/of/platform.c   |  32 +-
 drivers/of/selftest.c   |  79 +++
 drivers/of/testcase-data/testcases.dtsi |  10 +
 include/linux/of.h  |  80 ++-
 include/linux/of_platform.h |   7 +-
 15 files changed, 1067 insertions(+), 444 deletions(-)
 create mode 100644 Documentation/devicetree/changesets.txt
 create mode 100644 drivers/of/dynamic.c
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-07-15 Thread Grant Likely
I've got another question about powerpc reconfiguration. I was looking
at the dlpar_configure_connector() function in dlpar.c. I see that the
function has the ability to process multiple nodes with additional
sibling and child nodes. It appears to link them into a detached tree
structure, and the function returns a pointer to the first node.

All of the callers of that function then call dlpar_attach_node(),
which calls of_attach_node(). However, of_attach_node() only handles a
single node. It doesn't handle siblings or children. Is this a bug?
Does the configure connector ever actually receive more than one node
at once?

g.

On Fri, Jun 27, 2014 at 8:41 AM, Nathan Fontenot nf...@austin.ibm.com wrote:
 On 06/27/2014 07:41 AM, Grant Likely wrote:
 On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/25/2014 03:24 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 heh! I have often thought about adding reference counting to device tree
 properties.

 You horrible, horrible man.

 Yes. I are evil :)

 After looking again the work needed to add reference counts to properties
 would be huge. The few properties I am concerned with are specific to 
 powerpc
 so perhaps just adding an arch specific lock around updating those
 properties would work.

 Which code/properties? I'd like to have a look myself.

 /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory

 The property is updated in
 arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

 Specifically, what do you need for the locking? Are you wanting to hold
 off additional changes while that function is executing? Pantelis is
 adding a mutex for device tree writers. Holding that mutex would prevent
 any changes from happening in the tree without affecting readers. Would
 that be sufficient?

 That would work.

 -Nathan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-27 Thread Grant Likely
On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot nf...@austin.ibm.com 
wrote:
 On 06/25/2014 03:22 PM, Grant Likely wrote:
  On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot nf...@austin.ibm.com 
  wrote:
  On 06/23/2014 09:58 AM, Grant Likely wrote:
  On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
  pantelis.anton...@konsulko.com wrote:
  Hi Grant,
 
  CCing Thomas Gleixner  Steven Rostedt, since they might have a few
  ideas...
 
  On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
 
  Hi Nathan and Tyrel,
 
  I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
  I'm hoping you can help me. Right now, pseries seems to be the only
  user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
  the entire kernel because it requires all DT code to manage reference
  counting with iterating over nodes. Most users simply get it wrong.
  Pantelis did some investigation and found that the reference counts on
  a running kernel are all over the place. I have my doubts that any
  code really gets it right.
 
  The problem is that users need to know when it is appropriate to call
  of_node_get()/of_node_put(). All list traversals that exit early need
  an extra call to of_node_put(), and code that is searching for a node
  in the tree and holding a reference to it needs to call of_node_get().
 
 
  In hindsight it appears that drivers just can't get the lifecycle right.
  So we need to simplify things.
 
  I've got a few pseries questions:
  - What are the changes being requested by pseries firmware? Is it only
  CPUs and memory nodes, or does it manipulate things all over the tree?
  - How frequent are the changes? How many changes would be likely over
  the runtime of the system?
  - Are you able to verify that removed nodes are actually able to be
  freed correctly? Do you have any testcases for node removal?
 
  I'm thinking very seriously about changing the locking semantics of DT
  code entirely so that most users never have to worry about
  of_node_get/put at all. If the DT code is switched to use rcu
  primitives for tree iteration (which also means making DT code use
  list_head, something I'm already investigating), then instead of
  trying to figure out of_node_get/put rules, callers could use
  rcu_read_lock()/rcu_read_unlock() to protect the region that is
  searching over nodes, and only call of_node_get() if the node pointer
  is needed outside the rcu read-side lock.
 
  I'd really like to be rid of the node reference counting entirely, but
  I can't figure out a way of doing that safely, so I'd settle for
  making it a lot easier to get correct.
 
 
  Since we're going about changing things, how about that devtree_lock?
 
  I believe rcu would pretty much eliminate the devtree_lock entirely. All
  modifiers would need to grab a mutex to ensure there is only one writer
  at any given time, but readers would have free reign to parse the tree
  however they like.
 
  DT writers would have to follow some strict rules about how to handle
  nodes that are removed (ie. don't modify or of_node_put() them until
  after rcu is syncronized), but the number of writers is very small and
  we have control of all of them.
 
  We're using a raw_spinlock and we're always taking the lock with
  interrupts disabled.
 
  If we're going to make DT changes frequently during normal runtime
  and not only during boot time, those are bad for any kind of real-time
  performance.
 
  So the question is, do we really have code that access the live tree
  during atomic sections?  Is that something we want? Enforcing this
  will make our lives easier, and we'll get the change to replace
  that spinlock with a mutex.
 
  Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
  sections. I cannot put my finger on the exact code however. Nathan might
  know better. But, if I'm right, the whole problem goes away with RCU.
 
  I went back through the cpu hotplug code. we do update the DT during cpu
  hotplug but I don't see it happening during atomic sections.
 
  The code is in arch/powerpc/platforms/pseries/dlpar.c
  
  Great, thanks,
  
  By the way, notifiers currently get sent before any updates are applied
  to the tree. I want to change it so that the notifier gets sent
  afterwards. Does that work for you? I've looked through all the users
  and aside from a stupid block of code in arch/powerpc/kernel/prom.c
  which does things that should be done by of_attach_node(), it looks like
  everything should be fine.
 
 This would affect property updates. When doing a property update the
 notifier passes a pointer to a struct containing a device node
 pointer and a pointer to the new device node property.
 
 I know specifically in memory property updates we grab the current version
 of the device tree property and compare it to the 'new' version that 
 was passed to us.
 
 If you want to do the DT update before calling the notifier that should be
 fine for the memory update 

Re: OF_DYNAMIC node lifecycle

2014-06-27 Thread Pantelis Antoniou
Hi Grant,

On Jun 27, 2014, at 3:32 PM, Grant Likely wrote:

 On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/25/2014 03:22 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/23/2014 09:58 AM, Grant Likely wrote:
 On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
 pantelis.anton...@konsulko.com wrote:
 Hi Grant,
 

[snip]

 
 This would affect property updates. When doing a property update the
 notifier passes a pointer to a struct containing a device node
 pointer and a pointer to the new device node property.
 
 I know specifically in memory property updates we grab the current version
 of the device tree property and compare it to the 'new' version that 
 was passed to us.
 
 If you want to do the DT update before calling the notifier that should be
 fine for the memory update code and would only require very minimal
 updates.
 
 We could change the notifier to include both the old and new values.
 
 I've been thinking about changing the notifier format anyway. With the
 addition of bulk changes, it would be more efficient to send a single
 notifier for all the changes with a link to the change set instead of
 one at a time.
 

That one has my vote. We also need a bulk change notifier, and for device
driver use, some kind of wrapper for specific node/properties.

At the moment a notification is fired for any change in the tree, we might
work something more fine-grained. Like 'watch this node  subnodes', or
'watch this property (or set of properties)'

 g.


Regards

-- Pantelis
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-27 Thread Grant Likely
On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot nf...@austin.ibm.com 
wrote:
 On 06/25/2014 03:24 PM, Grant Likely wrote:
  On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot nf...@austin.ibm.com 
  wrote:
  heh! I have often thought about adding reference counting to device tree
  properties.
 
  You horrible, horrible man.
 
  Yes. I are evil :)
 
  After looking again the work needed to add reference counts to properties
  would be huge. The few properties I am concerned with are specific to 
  powerpc
  so perhaps just adding an arch specific lock around updating those
  properties would work.
  
  Which code/properties? I'd like to have a look myself.
 
 /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
 
 The property is updated in 
 arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()

Specifically, what do you need for the locking? Are you wanting to hold
off additional changes while that function is executing? Pantelis is
adding a mutex for device tree writers. Holding that mutex would prevent
any changes from happening in the tree without affecting readers. Would
that be sufficient?

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-27 Thread Nathan Fontenot
On 06/27/2014 07:40 AM, Pantelis Antoniou wrote:
 Hi Grant,
 
 On Jun 27, 2014, at 3:32 PM, Grant Likely wrote:
 
 On Thu, 26 Jun 2014 14:59:31 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/25/2014 03:22 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/23/2014 09:58 AM, Grant Likely wrote:
 On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
 pantelis.anton...@konsulko.com wrote:
 Hi Grant,

 
 [snip]
 

 This would affect property updates. When doing a property update the
 notifier passes a pointer to a struct containing a device node
 pointer and a pointer to the new device node property.

 I know specifically in memory property updates we grab the current version
 of the device tree property and compare it to the 'new' version that 
 was passed to us.

 If you want to do the DT update before calling the notifier that should be
 fine for the memory update code and would only require very minimal
 updates.

 We could change the notifier to include both the old and new values.

 I've been thinking about changing the notifier format anyway. With the
 addition of bulk changes, it would be more efficient to send a single
 notifier for all the changes with a link to the change set instead of
 one at a time.

 
 That one has my vote. We also need a bulk change notifier, and for device
 driver use, some kind of wrapper for specific node/properties.
 
 At the moment a notification is fired for any change in the tree, we might
 work something more fine-grained. Like 'watch this node  subnodes', or
 'watch this property (or set of properties)'
 

Both of these updates would work.

For property updates the only real requirement is that we can get to the new
and the old version of the property value.

I like the idea of being able to watch a single node/property. My experience
is that most code is only interested in updates to a single node or
property. Being able to avoid notifying everyone that has registered a
notifier for DT updates for every change would be nice.

-Nathan
-Nathan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-27 Thread Nathan Fontenot
On 06/27/2014 07:41 AM, Grant Likely wrote:
 On Thu, 26 Jun 2014 15:01:49 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/25/2014 03:24 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 heh! I have often thought about adding reference counting to device tree
 properties.

 You horrible, horrible man.

 Yes. I are evil :)

 After looking again the work needed to add reference counts to properties
 would be huge. The few properties I am concerned with are specific to 
 powerpc
 so perhaps just adding an arch specific lock around updating those
 properties would work.

 Which code/properties? I'd like to have a look myself.

 /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory

 The property is updated in 
 arch/powerpc/platforms/pseries/hotplug-memory.c:pseries_update_drconf_memory()
 
 Specifically, what do you need for the locking? Are you wanting to hold
 off additional changes while that function is executing? Pantelis is
 adding a mutex for device tree writers. Holding that mutex would prevent
 any changes from happening in the tree without affecting readers. Would
 that be sufficient?

That would work.

-Nathan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-26 Thread Nathan Fontenot
On 06/25/2014 03:22 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/23/2014 09:58 AM, Grant Likely wrote:
 On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
 pantelis.anton...@konsulko.com wrote:
 Hi Grant,

 CCing Thomas Gleixner  Steven Rostedt, since they might have a few
 ideas...

 On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:

 Hi Nathan and Tyrel,

 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.

 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().


 In hindsight it appears that drivers just can't get the lifecycle right.
 So we need to simplify things.

 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?
 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?

 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.

 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.


 Since we're going about changing things, how about that devtree_lock?

 I believe rcu would pretty much eliminate the devtree_lock entirely. All
 modifiers would need to grab a mutex to ensure there is only one writer
 at any given time, but readers would have free reign to parse the tree
 however they like.

 DT writers would have to follow some strict rules about how to handle
 nodes that are removed (ie. don't modify or of_node_put() them until
 after rcu is syncronized), but the number of writers is very small and
 we have control of all of them.

 We're using a raw_spinlock and we're always taking the lock with
 interrupts disabled.

 If we're going to make DT changes frequently during normal runtime
 and not only during boot time, those are bad for any kind of real-time
 performance.

 So the question is, do we really have code that access the live tree
 during atomic sections?  Is that something we want? Enforcing this
 will make our lives easier, and we'll get the change to replace
 that spinlock with a mutex.

 Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
 sections. I cannot put my finger on the exact code however. Nathan might
 know better. But, if I'm right, the whole problem goes away with RCU.

 I went back through the cpu hotplug code. we do update the DT during cpu
 hotplug but I don't see it happening during atomic sections.

 The code is in arch/powerpc/platforms/pseries/dlpar.c
 
 Great, thanks,
 
 By the way, notifiers currently get sent before any updates are applied
 to the tree. I want to change it so that the notifier gets sent
 afterwards. Does that work for you? I've looked through all the users
 and aside from a stupid block of code in arch/powerpc/kernel/prom.c
 which does things that should be done by of_attach_node(), it looks like
 everything should be fine.

This would affect property updates. When doing a property update the
notifier passes a pointer to a struct containing a device node
pointer and a pointer to the new device node property.

I know specifically in memory property updates we grab the current version
of the device tree property and compare it to the 'new' version that 
was passed to us.

If you want to do the DT update before calling the notifier that should be
fine for the memory update code and would only require very minimal
updates.

It does look like OF_RECONFIG_UPDATE_PROPERTY is used in a couple of other
places that would need to be investigated.

-Nathan 


Re: OF_DYNAMIC node lifecycle

2014-06-26 Thread Nathan Fontenot
On 06/25/2014 03:24 PM, Grant Likely wrote:
 On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/23/2014 09:48 AM, Grant Likely wrote:
 On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/18/2014 03:07 PM, Grant Likely wrote:
 Hi Nathan and Tyrel,

 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.

 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().

 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?

 The short answer, everything.

 :-)

 For pseries the two big actions that can change the device tree are
 adding/removing resources and partition migration.

 The most frequent updates to the device tree happen during resource
 (cpu, memory, and pci/phb) add and remove. During this process we add
 and remove the node and its properties from the device tree.
 - For memory on newer systems this just involves updating the
   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
   firmware levels add and remove the memroy@XXX nodes and their properties.
 - For cpus the cpus/PowerPC,POWER nodes and its properties are added
   or removed
 - For pci/phb the pci@X nodes and properties are added/removed.

 The less frequent operation of live partition migration (and 
 suspend/resume)
 can update just about anything in the device tree. When this occurs and the
 systems starts after being migrated (or waking up after a suspend) we make
 a call to firmware to get updates to the device tree for the new hardware
 we are running on.
  
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?

 This can happen frequently.

 Thanks, that is exactly the information that I want. I'm not so much
 concerned with the addition or removal of nodes/properties, which is
 actually pretty easy to handle. It is the lifecycle of allocations on
 dynamic nodes that causes heartburn.

 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?

 I have always tested this by doing resource add/remove, usually cpu and 
 memory
 since it is the easiest.

 Is that just testing the functionality, or do you have tests that check
 if the memory gets freed?

 In general it's just functionality testing.


 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.


 This sounds good. I like just taking the rcu lock around accessing the DT.
 Do we have many places where DT node pointers are held that require
 keeping the of_node_get/put calls? If this did exist perhaps we could
 update those places to look up the DT node every time instead of
 holding on to the pointer. We could just get rid of the reference counting
 altogether then.

 There are a few, but I would be happy to restrict reference counting to
 only those locations. Most places will decode the DT data, and then
 throw away the reference. We /might/ even be able to do rcu_lock/unlock
 around the entire probe path which would make it transparent to all
 device drivers.

 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.


 heh! I have often thought about adding reference counting to device tree
 properties.

 You horrible, horrible man.

 Yes. I are evil :)

 After looking again the work needed to add reference counts to properties
 would be huge. The few properties I am concerned with are specific to powerpc
 so perhaps just adding an arch specific lock around updating those
 properties would work.
 
 Which code/properties? I'd like to 

Re: OF_DYNAMIC node lifecycle

2014-06-25 Thread Grant Likely
On Tue, 24 Jun 2014 15:07:05 -0500, Nathan Fontenot nf...@austin.ibm.com 
wrote:
 On 06/23/2014 09:58 AM, Grant Likely wrote:
  On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
  pantelis.anton...@konsulko.com wrote:
  Hi Grant,
 
  CCing Thomas Gleixner  Steven Rostedt, since they might have a few
  ideas...
 
  On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
 
  Hi Nathan and Tyrel,
 
  I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
  I'm hoping you can help me. Right now, pseries seems to be the only
  user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
  the entire kernel because it requires all DT code to manage reference
  counting with iterating over nodes. Most users simply get it wrong.
  Pantelis did some investigation and found that the reference counts on
  a running kernel are all over the place. I have my doubts that any
  code really gets it right.
 
  The problem is that users need to know when it is appropriate to call
  of_node_get()/of_node_put(). All list traversals that exit early need
  an extra call to of_node_put(), and code that is searching for a node
  in the tree and holding a reference to it needs to call of_node_get().
 
 
  In hindsight it appears that drivers just can't get the lifecycle right.
  So we need to simplify things.
 
  I've got a few pseries questions:
  - What are the changes being requested by pseries firmware? Is it only
  CPUs and memory nodes, or does it manipulate things all over the tree?
  - How frequent are the changes? How many changes would be likely over
  the runtime of the system?
  - Are you able to verify that removed nodes are actually able to be
  freed correctly? Do you have any testcases for node removal?
 
  I'm thinking very seriously about changing the locking semantics of DT
  code entirely so that most users never have to worry about
  of_node_get/put at all. If the DT code is switched to use rcu
  primitives for tree iteration (which also means making DT code use
  list_head, something I'm already investigating), then instead of
  trying to figure out of_node_get/put rules, callers could use
  rcu_read_lock()/rcu_read_unlock() to protect the region that is
  searching over nodes, and only call of_node_get() if the node pointer
  is needed outside the rcu read-side lock.
 
  I'd really like to be rid of the node reference counting entirely, but
  I can't figure out a way of doing that safely, so I'd settle for
  making it a lot easier to get correct.
 
 
  Since we're going about changing things, how about that devtree_lock?
  
  I believe rcu would pretty much eliminate the devtree_lock entirely. All
  modifiers would need to grab a mutex to ensure there is only one writer
  at any given time, but readers would have free reign to parse the tree
  however they like.
  
  DT writers would have to follow some strict rules about how to handle
  nodes that are removed (ie. don't modify or of_node_put() them until
  after rcu is syncronized), but the number of writers is very small and
  we have control of all of them.
  
  We're using a raw_spinlock and we're always taking the lock with
  interrupts disabled.
 
  If we're going to make DT changes frequently during normal runtime
  and not only during boot time, those are bad for any kind of real-time
  performance.
 
  So the question is, do we really have code that access the live tree
  during atomic sections?  Is that something we want? Enforcing this
  will make our lives easier, and we'll get the change to replace
  that spinlock with a mutex.
  
  Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
  sections. I cannot put my finger on the exact code however. Nathan might
  know better. But, if I'm right, the whole problem goes away with RCU.
 
 I went back through the cpu hotplug code. we do update the DT during cpu
 hotplug but I don't see it happening during atomic sections.
 
 The code is in arch/powerpc/platforms/pseries/dlpar.c

Great, thanks,

By the way, notifiers currently get sent before any updates are applied
to the tree. I want to change it so that the notifier gets sent
afterwards. Does that work for you? I've looked through all the users
and aside from a stupid block of code in arch/powerpc/kernel/prom.c
which does things that should be done by of_attach_node(), it looks like
everything should be fine.

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-25 Thread Grant Likely
On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot nf...@austin.ibm.com 
wrote:
 On 06/23/2014 09:48 AM, Grant Likely wrote:
  On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot nf...@austin.ibm.com 
  wrote:
  On 06/18/2014 03:07 PM, Grant Likely wrote:
  Hi Nathan and Tyrel,
 
  I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
  I'm hoping you can help me. Right now, pseries seems to be the only
  user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
  the entire kernel because it requires all DT code to manage reference
  counting with iterating over nodes. Most users simply get it wrong.
  Pantelis did some investigation and found that the reference counts on
  a running kernel are all over the place. I have my doubts that any
  code really gets it right.
 
  The problem is that users need to know when it is appropriate to call
  of_node_get()/of_node_put(). All list traversals that exit early need
  an extra call to of_node_put(), and code that is searching for a node
  in the tree and holding a reference to it needs to call of_node_get().
 
  I've got a few pseries questions:
  - What are the changes being requested by pseries firmware? Is it only
  CPUs and memory nodes, or does it manipulate things all over the tree?
 
  The short answer, everything.
  
  :-)
  
  For pseries the two big actions that can change the device tree are
  adding/removing resources and partition migration.
 
  The most frequent updates to the device tree happen during resource
  (cpu, memory, and pci/phb) add and remove. During this process we add
  and remove the node and its properties from the device tree.
  - For memory on newer systems this just involves updating the
ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
firmware levels add and remove the memroy@XXX nodes and their properties.
  - For cpus the cpus/PowerPC,POWER nodes and its properties are added
or removed
  - For pci/phb the pci@X nodes and properties are added/removed.
 
  The less frequent operation of live partition migration (and 
  suspend/resume)
  can update just about anything in the device tree. When this occurs and the
  systems starts after being migrated (or waking up after a suspend) we make
  a call to firmware to get updates to the device tree for the new hardware
  we are running on.
   
  - How frequent are the changes? How many changes would be likely over
  the runtime of the system?
 
  This can happen frequently.
  
  Thanks, that is exactly the information that I want. I'm not so much
  concerned with the addition or removal of nodes/properties, which is
  actually pretty easy to handle. It is the lifecycle of allocations on
  dynamic nodes that causes heartburn.
  
  - Are you able to verify that removed nodes are actually able to be
  freed correctly? Do you have any testcases for node removal?
 
  I have always tested this by doing resource add/remove, usually cpu and 
  memory
  since it is the easiest.
  
  Is that just testing the functionality, or do you have tests that check
  if the memory gets freed?
 
 In general it's just functionality testing.
 
  
  I'm thinking very seriously about changing the locking semantics of DT
  code entirely so that most users never have to worry about
  of_node_get/put at all. If the DT code is switched to use rcu
  primitives for tree iteration (which also means making DT code use
  list_head, something I'm already investigating), then instead of
  trying to figure out of_node_get/put rules, callers could use
  rcu_read_lock()/rcu_read_unlock() to protect the region that is
  searching over nodes, and only call of_node_get() if the node pointer
  is needed outside the rcu read-side lock.
 
 
  This sounds good. I like just taking the rcu lock around accessing the DT.
  Do we have many places where DT node pointers are held that require
  keeping the of_node_get/put calls? If this did exist perhaps we could
  update those places to look up the DT node every time instead of
  holding on to the pointer. We could just get rid of the reference counting
  altogether then.
  
  There are a few, but I would be happy to restrict reference counting to
  only those locations. Most places will decode the DT data, and then
  throw away the reference. We /might/ even be able to do rcu_lock/unlock
  around the entire probe path which would make it transparent to all
  device drivers.
  
  I'd really like to be rid of the node reference counting entirely, but
  I can't figure out a way of doing that safely, so I'd settle for
  making it a lot easier to get correct.
 
 
  heh! I have often thought about adding reference counting to device tree
  properties.
  
  You horrible, horrible man.
 
 Yes. I are evil :)
 
 After looking again the work needed to add reference counts to properties
 would be huge. The few properties I am concerned with are specific to powerpc
 so perhaps just adding an arch specific lock around updating 

Re: OF_DYNAMIC node lifecycle

2014-06-24 Thread Nathan Fontenot
On 06/23/2014 09:58 AM, Grant Likely wrote:
 On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
 pantelis.anton...@konsulko.com wrote:
 Hi Grant,

 CCing Thomas Gleixner  Steven Rostedt, since they might have a few
 ideas...

 On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:

 Hi Nathan and Tyrel,

 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.

 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().


 In hindsight it appears that drivers just can't get the lifecycle right.
 So we need to simplify things.

 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?
 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?

 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.

 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.


 Since we're going about changing things, how about that devtree_lock?
 
 I believe rcu would pretty much eliminate the devtree_lock entirely. All
 modifiers would need to grab a mutex to ensure there is only one writer
 at any given time, but readers would have free reign to parse the tree
 however they like.
 
 DT writers would have to follow some strict rules about how to handle
 nodes that are removed (ie. don't modify or of_node_put() them until
 after rcu is syncronized), but the number of writers is very small and
 we have control of all of them.
 
 We're using a raw_spinlock and we're always taking the lock with
 interrupts disabled.

 If we're going to make DT changes frequently during normal runtime
 and not only during boot time, those are bad for any kind of real-time
 performance.

 So the question is, do we really have code that access the live tree
 during atomic sections?  Is that something we want? Enforcing this
 will make our lives easier, and we'll get the change to replace
 that spinlock with a mutex.
 
 Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
 sections. I cannot put my finger on the exact code however. Nathan might
 know better. But, if I'm right, the whole problem goes away with RCU.

I went back through the cpu hotplug code. we do update the DT during cpu
hotplug but I don't see it happening during atomic sections.

The code is in arch/powerpc/platforms/pseries/dlpar.c

-Nathan

 
 The design with RCU is to switch struct device_node and struct property
 to use list_head and/or hlist_head with the _rcu accessors. They allow
 items to be removed from a list without syncronizing with readers. Right
 now we have two lists that need to be modified; the allnodes list and
 the sibling list. I *think* it will be fine for the two list removals to
 be non-atomic (there will be a brief period where the node can be found
 on one list, but not the other) because it is a transient state already
 accounted for in rcu read-side critical region.
 
 That said, I've also got a design to remove the allnodes list entirely
 and only work with the sibling list. I need to prototype this.
 
 We'll also need a transition plan to move to RCU. I think the existing
 iterators can be modified to do the rcu locking in-line, but still require
 the of_node_get/put stuff (basically, so existing code continue to works
 unchanged). Then we can add _rcu versions that drop the need for
 of_node_get/put(). When everything is converted, the old iterators can
 be dropped.
 
 g.
 

___
Linuxppc-dev mailing list

Re: OF_DYNAMIC node lifecycle

2014-06-24 Thread Nathan Fontenot
On 06/23/2014 09:48 AM, Grant Likely wrote:
 On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot nf...@austin.ibm.com 
 wrote:
 On 06/18/2014 03:07 PM, Grant Likely wrote:
 Hi Nathan and Tyrel,

 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.

 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().

 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?

 The short answer, everything.
 
 :-)
 
 For pseries the two big actions that can change the device tree are
 adding/removing resources and partition migration.

 The most frequent updates to the device tree happen during resource
 (cpu, memory, and pci/phb) add and remove. During this process we add
 and remove the node and its properties from the device tree.
 - For memory on newer systems this just involves updating the
   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
   firmware levels add and remove the memroy@XXX nodes and their properties.
 - For cpus the cpus/PowerPC,POWER nodes and its properties are added
   or removed
 - For pci/phb the pci@X nodes and properties are added/removed.

 The less frequent operation of live partition migration (and suspend/resume)
 can update just about anything in the device tree. When this occurs and the
 systems starts after being migrated (or waking up after a suspend) we make
 a call to firmware to get updates to the device tree for the new hardware
 we are running on.
  
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?

 This can happen frequently.
 
 Thanks, that is exactly the information that I want. I'm not so much
 concerned with the addition or removal of nodes/properties, which is
 actually pretty easy to handle. It is the lifecycle of allocations on
 dynamic nodes that causes heartburn.
 
 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?

 I have always tested this by doing resource add/remove, usually cpu and 
 memory
 since it is the easiest.
 
 Is that just testing the functionality, or do you have tests that check
 if the memory gets freed?

In general it's just functionality testing.

 
 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.


 This sounds good. I like just taking the rcu lock around accessing the DT.
 Do we have many places where DT node pointers are held that require
 keeping the of_node_get/put calls? If this did exist perhaps we could
 update those places to look up the DT node every time instead of
 holding on to the pointer. We could just get rid of the reference counting
 altogether then.
 
 There are a few, but I would be happy to restrict reference counting to
 only those locations. Most places will decode the DT data, and then
 throw away the reference. We /might/ even be able to do rcu_lock/unlock
 around the entire probe path which would make it transparent to all
 device drivers.
 
 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.


 heh! I have often thought about adding reference counting to device tree
 properties.
 
 You horrible, horrible man.

Yes. I are evil :)

After looking again the work needed to add reference counts to properties
would be huge. The few properties I am concerned with are specific to powerpc
so perhaps just adding an arch specific lock around updating those
properties would work.

-Nathan

 
 I don't really want to but there are some properties that can
 get updated frequently (namely the one mentioned above for memory) that
 can also get 

Re: OF_DYNAMIC node lifecycle

2014-06-23 Thread Grant Likely
On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot nf...@austin.ibm.com 
wrote:
 On 06/18/2014 03:07 PM, Grant Likely wrote:
  Hi Nathan and Tyrel,
  
  I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
  I'm hoping you can help me. Right now, pseries seems to be the only
  user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
  the entire kernel because it requires all DT code to manage reference
  counting with iterating over nodes. Most users simply get it wrong.
  Pantelis did some investigation and found that the reference counts on
  a running kernel are all over the place. I have my doubts that any
  code really gets it right.
  
  The problem is that users need to know when it is appropriate to call
  of_node_get()/of_node_put(). All list traversals that exit early need
  an extra call to of_node_put(), and code that is searching for a node
  in the tree and holding a reference to it needs to call of_node_get().
  
  I've got a few pseries questions:
  - What are the changes being requested by pseries firmware? Is it only
  CPUs and memory nodes, or does it manipulate things all over the tree?
 
 The short answer, everything.

:-)

 For pseries the two big actions that can change the device tree are
 adding/removing resources and partition migration.
 
 The most frequent updates to the device tree happen during resource
 (cpu, memory, and pci/phb) add and remove. During this process we add
 and remove the node and its properties from the device tree.
 - For memory on newer systems this just involves updating the
   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
   firmware levels add and remove the memroy@XXX nodes and their properties.
 - For cpus the cpus/PowerPC,POWER nodes and its properties are added
   or removed
 - For pci/phb the pci@X nodes and properties are added/removed.
 
 The less frequent operation of live partition migration (and suspend/resume)
 can update just about anything in the device tree. When this occurs and the
 systems starts after being migrated (or waking up after a suspend) we make
 a call to firmware to get updates to the device tree for the new hardware
 we are running on.
  
  - How frequent are the changes? How many changes would be likely over
  the runtime of the system?
 
 This can happen frequently.

Thanks, that is exactly the information that I want. I'm not so much
concerned with the addition or removal of nodes/properties, which is
actually pretty easy to handle. It is the lifecycle of allocations on
dynamic nodes that causes heartburn.

  - Are you able to verify that removed nodes are actually able to be
  freed correctly? Do you have any testcases for node removal?
 
 I have always tested this by doing resource add/remove, usually cpu and memory
 since it is the easiest.

Is that just testing the functionality, or do you have tests that check
if the memory gets freed?

  I'm thinking very seriously about changing the locking semantics of DT
  code entirely so that most users never have to worry about
  of_node_get/put at all. If the DT code is switched to use rcu
  primitives for tree iteration (which also means making DT code use
  list_head, something I'm already investigating), then instead of
  trying to figure out of_node_get/put rules, callers could use
  rcu_read_lock()/rcu_read_unlock() to protect the region that is
  searching over nodes, and only call of_node_get() if the node pointer
  is needed outside the rcu read-side lock.
  
 
 This sounds good. I like just taking the rcu lock around accessing the DT.
 Do we have many places where DT node pointers are held that require
 keeping the of_node_get/put calls? If this did exist perhaps we could
 update those places to look up the DT node every time instead of
 holding on to the pointer. We could just get rid of the reference counting
 altogether then.

There are a few, but I would be happy to restrict reference counting to
only those locations. Most places will decode the DT data, and then
throw away the reference. We /might/ even be able to do rcu_lock/unlock
around the entire probe path which would make it transparent to all
device drivers.

  I'd really like to be rid of the node reference counting entirely, but
  I can't figure out a way of doing that safely, so I'd settle for
  making it a lot easier to get correct.
  
 
 heh! I have often thought about adding reference counting to device tree
 properties.

You horrible, horrible man.

 I don't really want to but there are some properties that can
 get updated frequently (namely the one mentioned above for memory) that
 can also get pretty big, especially on systems with a lot of memory. We
 never free the memory for old versions of a device tree property. This is
 a pretty minor issue though and probably best suited for a separate
 discussion after resolving this. 

We might be able to do some in-place modification of properties if the
size of the property doesn't change. 

Re: OF_DYNAMIC node lifecycle

2014-06-23 Thread Grant Likely
On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
pantelis.anton...@konsulko.com wrote:
 Hi Grant,
 
 CCing Thomas Gleixner  Steven Rostedt, since they might have a few
 ideas...
 
 On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
 
  Hi Nathan and Tyrel,
  
  I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
  I'm hoping you can help me. Right now, pseries seems to be the only
  user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
  the entire kernel because it requires all DT code to manage reference
  counting with iterating over nodes. Most users simply get it wrong.
  Pantelis did some investigation and found that the reference counts on
  a running kernel are all over the place. I have my doubts that any
  code really gets it right.
  
  The problem is that users need to know when it is appropriate to call
  of_node_get()/of_node_put(). All list traversals that exit early need
  an extra call to of_node_put(), and code that is searching for a node
  in the tree and holding a reference to it needs to call of_node_get().
  
 
 In hindsight it appears that drivers just can't get the lifecycle right.
 So we need to simplify things.
 
  I've got a few pseries questions:
  - What are the changes being requested by pseries firmware? Is it only
  CPUs and memory nodes, or does it manipulate things all over the tree?
  - How frequent are the changes? How many changes would be likely over
  the runtime of the system?
  - Are you able to verify that removed nodes are actually able to be
  freed correctly? Do you have any testcases for node removal?
  
  I'm thinking very seriously about changing the locking semantics of DT
  code entirely so that most users never have to worry about
  of_node_get/put at all. If the DT code is switched to use rcu
  primitives for tree iteration (which also means making DT code use
  list_head, something I'm already investigating), then instead of
  trying to figure out of_node_get/put rules, callers could use
  rcu_read_lock()/rcu_read_unlock() to protect the region that is
  searching over nodes, and only call of_node_get() if the node pointer
  is needed outside the rcu read-side lock.
  
  I'd really like to be rid of the node reference counting entirely, but
  I can't figure out a way of doing that safely, so I'd settle for
  making it a lot easier to get correct.
  
 
 Since we're going about changing things, how about that devtree_lock?

I believe rcu would pretty much eliminate the devtree_lock entirely. All
modifiers would need to grab a mutex to ensure there is only one writer
at any given time, but readers would have free reign to parse the tree
however they like.

DT writers would have to follow some strict rules about how to handle
nodes that are removed (ie. don't modify or of_node_put() them until
after rcu is syncronized), but the number of writers is very small and
we have control of all of them.

 We're using a raw_spinlock and we're always taking the lock with
 interrupts disabled.
 
 If we're going to make DT changes frequently during normal runtime
 and not only during boot time, those are bad for any kind of real-time
 performance.
 
 So the question is, do we really have code that access the live tree
 during atomic sections?  Is that something we want? Enforcing this
 will make our lives easier, and we'll get the change to replace
 that spinlock with a mutex.

Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
sections. I cannot put my finger on the exact code however. Nathan might
know better. But, if I'm right, the whole problem goes away with RCU.

The design with RCU is to switch struct device_node and struct property
to use list_head and/or hlist_head with the _rcu accessors. They allow
items to be removed from a list without syncronizing with readers. Right
now we have two lists that need to be modified; the allnodes list and
the sibling list. I *think* it will be fine for the two list removals to
be non-atomic (there will be a brief period where the node can be found
on one list, but not the other) because it is a transient state already
accounted for in rcu read-side critical region.

That said, I've also got a design to remove the allnodes list entirely
and only work with the sibling list. I need to prototype this.

We'll also need a transition plan to move to RCU. I think the existing
iterators can be modified to do the rcu locking in-line, but still require
the of_node_get/put stuff (basically, so existing code continue to works
unchanged). Then we can add _rcu versions that drop the need for
of_node_get/put(). When everything is converted, the old iterators can
be dropped.

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-23 Thread Pantelis Antoniou
Hi Grant,

On Jun 23, 2014, at 5:58 PM, Grant Likely wrote:

 On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou 
 pantelis.anton...@konsulko.com wrote:
 Hi Grant,
 
 CCing Thomas Gleixner  Steven Rostedt, since they might have a few
 ideas...
 
 On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
 
 Hi Nathan and Tyrel,
 
 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.
 
 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().
 
 
 In hindsight it appears that drivers just can't get the lifecycle right.
 So we need to simplify things.
 
 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?
 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?
 
 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.
 
 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.
 
 
 Since we're going about changing things, how about that devtree_lock?
 
 I believe rcu would pretty much eliminate the devtree_lock entirely. All
 modifiers would need to grab a mutex to ensure there is only one writer
 at any given time, but readers would have free reign to parse the tree
 however they like.
 
 DT writers would have to follow some strict rules about how to handle
 nodes that are removed (ie. don't modify or of_node_put() them until
 after rcu is syncronized), but the number of writers is very small and
 we have control of all of them.
 

There's one final nitpick with transactions; we might need another 
node/property flag marking the 'in-progress' state so that
can be skipped by iterators, but in general this looks good.

 We're using a raw_spinlock and we're always taking the lock with
 interrupts disabled.
 
 If we're going to make DT changes frequently during normal runtime
 and not only during boot time, those are bad for any kind of real-time
 performance.
 
 So the question is, do we really have code that access the live tree
 during atomic sections?  Is that something we want? Enforcing this
 will make our lives easier, and we'll get the change to replace
 that spinlock with a mutex.
 
 Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
 sections. I cannot put my finger on the exact code however. Nathan might
 know better. But, if I'm right, the whole problem goes away with RCU.
 

This is just bad. Why would you need to that?

 The design with RCU is to switch struct device_node and struct property
 to use list_head and/or hlist_head with the _rcu accessors. They allow
 items to be removed from a list without syncronizing with readers. Right
 now we have two lists that need to be modified; the allnodes list and
 the sibling list. I *think* it will be fine for the two list removals to
 be non-atomic (there will be a brief period where the node can be found
 on one list, but not the other) because it is a transient state already
 accounted for in rcu read-side critical region.
 

See above about transient states.

 That said, I've also got a design to remove the allnodes list entirely
 and only work with the sibling list. I need to prototype this.
 

In my original patchset for the overlays I used a single node as a root
and didn't deal with allnodes at all. So it can definitely can work.

 We'll also need a transition plan to move to RCU. I think the existing
 iterators can be modified to do the rcu locking in-line, but still require
 the of_node_get/put stuff (basically, so existing code continue 

Re: OF_DYNAMIC node lifecycle

2014-06-23 Thread Grant Likely
On Mon, 23 Jun 2014 18:26:04 +0300, Pantelis Antoniou 
pantelis.anton...@konsulko.com wrote:
 On Jun 23, 2014, at 5:58 PM, Grant Likely wrote:
  We'll also need a transition plan to move to RCU. I think the existing
  iterators can be modified to do the rcu locking in-line, but still require
  the of_node_get/put stuff (basically, so existing code continue to works
  unchanged). Then we can add _rcu versions that drop the need for
  of_node_get/put(). When everything is converted, the old iterators can
  be dropped.
  
 
 Eventually yes. We're not close to that yet. I'd be happy if we get the 
 lifecycle issues fixed right now (with of_node_get/put) and plan ahead.
 I am sure we missed a few things, which we will come across.

If we agree on the plan to keep of_node_get/put, but strongly reduce the
usage by switching to RCU, then I'm generally okay with proceeding with
the overlay feature because I can see how it would work in the new
model.

g.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-19 Thread Pantelis Antoniou
Hi Grant,

CCing Thomas Gleixner  Steven Rostedt, since they might have a few
ideas...

On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:

 Hi Nathan and Tyrel,
 
 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.
 
 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().
 

In hindsight it appears that drivers just can't get the lifecycle right.
So we need to simplify things.

 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?
 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?
 
 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.
 
 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.
 

Since we're going about changing things, how about that devtree_lock?

We're using a raw_spinlock and we're always taking the lock with
interrupts disabled.

If we're going to make DT changes frequently during normal runtime
and not only during boot time, those are bad for any kind of real-time
performance.

So the question is, do we really have code that access the live tree
during atomic sections? Is that something we want? Enforcing this
will make our lives easier, and we'll get the change to replace
that spinlock with a mutex.

 
 Thoughts?
 
 g.

Regards

-- Pantelis

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: OF_DYNAMIC node lifecycle

2014-06-19 Thread Nathan Fontenot
On 06/18/2014 03:07 PM, Grant Likely wrote:
 Hi Nathan and Tyrel,
 
 I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
 I'm hoping you can help me. Right now, pseries seems to be the only
 user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
 the entire kernel because it requires all DT code to manage reference
 counting with iterating over nodes. Most users simply get it wrong.
 Pantelis did some investigation and found that the reference counts on
 a running kernel are all over the place. I have my doubts that any
 code really gets it right.
 
 The problem is that users need to know when it is appropriate to call
 of_node_get()/of_node_put(). All list traversals that exit early need
 an extra call to of_node_put(), and code that is searching for a node
 in the tree and holding a reference to it needs to call of_node_get().
 
 I've got a few pseries questions:
 - What are the changes being requested by pseries firmware? Is it only
 CPUs and memory nodes, or does it manipulate things all over the tree?

The short answer, everything.

For pseries the two big actions that can change the device tree are
adding/removing resources and partition migration.

The most frequent updates to the device tree happen during resource
(cpu, memory, and pci/phb) add and remove. During this process we add
and remove the node and its properties from the device tree.
- For memory on newer systems this just involves updating the
  ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
  firmware levels add and remove the memroy@XXX nodes and their properties.
- For cpus the cpus/PowerPC,POWER nodes and its properties are added
  or removed
- For pci/phb the pci@X nodes and properties are added/removed.

The less frequent operation of live partition migration (and suspend/resume)
can update just about anything in the device tree. When this occurs and the
systems starts after being migrated (or waking up after a suspend) we make
a call to firmware to get updates to the device tree for the new hardware
we are running on.
 
 - How frequent are the changes? How many changes would be likely over
 the runtime of the system?

This can happen frequently.

 - Are you able to verify that removed nodes are actually able to be
 freed correctly? Do you have any testcases for node removal?

I have always tested this by doing resource add/remove, usually cpu and memory
since it is the easiest.

 
 I'm thinking very seriously about changing the locking semantics of DT
 code entirely so that most users never have to worry about
 of_node_get/put at all. If the DT code is switched to use rcu
 primitives for tree iteration (which also means making DT code use
 list_head, something I'm already investigating), then instead of
 trying to figure out of_node_get/put rules, callers could use
 rcu_read_lock()/rcu_read_unlock() to protect the region that is
 searching over nodes, and only call of_node_get() if the node pointer
 is needed outside the rcu read-side lock.
 

This sounds good. I like just taking the rcu lock around accessing the DT.
Do we have many places where DT node pointers are held that require
keeping the of_node_get/put calls? If this did exist perhaps we could
update those places to look up the DT node every time instead of
holding on to the pointer. We could just get rid of the reference counting
altogether then.

 I'd really like to be rid of the node reference counting entirely, but
 I can't figure out a way of doing that safely, so I'd settle for
 making it a lot easier to get correct.
 

heh! I have often thought about adding reference counting to device tree
properties. I don't really want to but there are some properties that can
get updated frequently (namely the one mentioned above for memory) that
can also get pretty big, especially on systems with a lot of memory. We
never free the memory for old versions of a device tree property. This is
a pretty minor issue though and probably best suited for a separate
discussion after resolving this. 

Other than pseries, who else does dynamic device tree updating? Are we the
only ones?

-Nathan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev