Hi Mark, Shame on me... I didn't actually do a lot of progress, but I'm still willing to do it.
I was sort of waiting for the whole RuleFest thing finish, so that I could bother you guys with questions on IRC. I will try to get in touch with you tomorrow. Leo. On Wed, Oct 27, 2010 at 11:46 PM, Mark Proctor <[email protected]>wrote: > Anyone made progress on this? > > Mark > > On 20/08/2010 17:02, Mark Proctor wrote: > > In an effort to help encourage those thinking of learning more about the > internals of rule engines. I have made a document on implementating left and > right unlinking. I describe the initial paper in terms relevant to Drools > users, and then how that can be implemented in Drools and a series of > enhancements over the original paper. The task is actually surprisingly > simple and you only need to learn a very small part of the Drools > implementation to do it, as such it's a great getting started task. For > really large stateful systems of hundreds or even thousands of rules and > hundreds of thousands of facts it should save significant amounts of memory. > http://blog.athico.com/2010/08/left-and-right-unlinking-community.html > > Any takers? > > Mark > > Introduction > The following paper describes Left and Right unlinking enhancements for > Rete based networks: > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.6246 > > A rete based rule engine consists of two parts of the network, the alpha > nodes and the beta nodes. When an object is first inserted into the engine > it is discriminated against by the object type Node, this is a one input and > one output node. From there it may be further discriminated against by alpha > nodes that constrain on literal values before reaching the right input of a > join node in the beta part of the network. Join nodes have two inputs, left > and right. The right input receives propagations consisting of a single > object from the alpha part of the network. The left input receives > propagations consisting of 1 or more objects, from the parent beta node. We > refer to these propagating objects as LeftTuple and RightTuple, other > engines also use the terms tokens or partial matches. When a tuple > propagation reaches a left or right input it's stored in that inputs memory > and it attempts to join with all possible tuples on the opposite side. If > there are no tuples on the opposite side then no join can happen and the > tuple just waits in the node's memory until a propagation from the opposite > side attempts to join with it. If a given. It would be better if the engine > could avoid populating that node's memory until both sides have tuples. Left > and right unlinking are solutions to this problem. > > The paper proposes that a node can either be left unlinked or right > unlinked, but not both, as then the rule would be completely disconnected > from the network. Unlinking an input means that it will not receive any > propagations and that the node's memory for that input is not populated, > saving memory space. When the opposite side, which is still linked, receives > a propagation the unlinked side is linked back in and receives all the none > propagated tuples. As both sides cannot be unlinked, the paper describes a > simple heuristic for choosing which side to unlink. Which ever side becomes > empty first, then unlink the other. It says that on start up just > arbitrarily chose to unlink one side as default. The initial hit from > choosing the wrong side will be negligible, as the heuristic corrects this > after the first set of propagations. > > If the left input becomes empty the right input is unlink, thus clearing > the right input's memory too. The moment the left input receives a > propagation it re-attaches the right input fully populating it's memory. The > node can then attempt joins as normal. Vice-versa if the right input becomes > empty it unlinks the left input. The moment the right input receives a > propagation it re-attaches the left input fully populating it's memory so > that the node can attempt to join as normal. > > Implementing Left and Right Unlinking for shared Knowledge Bases > > The description of unlinking in the paper won't work for Drools or for > other rule engines that share the knowledge base between multiple sessions. > In Drools the session data is decoupled from the main knowledge base and > multiple sessions can share the same knowledge base. The paper above > describes systems where the session data is tightly coupled to the knowledge > base and the knowledge base has only a single session. In shared systems a > node input that is empty for one session might not be empty for another. > Instead of physically unlinking the nodes, as described in the paper, an > integer value can be used on the session's node memory that indicates if the > node is unlinked for left, right or both inputs. When the propagating node > attempts to propagate instead of just creating a left or right tuple and > pushing it into the node. It'll first retrieve the node's memory and only > create the tuple and propagate if it's linked. > > This is great as it also avoids creating tuple objects that would just be > discarded afterwards as there would be nothing to join with, making things > lighter on the GC. However it means the engine looks up the node memory > twice, once before propagating to the node and also inside of the node as it > attempt to do joins. Instead the node memory should be looked up once, prior > to propagating and then passed as an argument, avoiding the double lookup. > > Traditional Rete has memory per alpha node, for each literal constraint, in > the network. Drools does not have alpha memory, instead facts are pulled > from the object type node. This means that facts may needlessly evaluate in > the alpha part of the network, only to be refused addition to the node > memory afterwards. Rete supports something called “node sharing”, where > multiple rules with similar constructs use the same nodes in the network. > For this reason shared nodes cannot easily be unlinked. As a compromise when > the alpha node is no longer shared, the network can do a node memory lookup, > prior to doing the evaluation and check if that section of the network is > unlinked and avoid attempting the evaluation if it is. This allows for left > and right unlinking to be used in a engine such as Drools. > > Using Left and Right Unlinking at the Same Time > > The original paper describes an implantation in which a node cannot have > both the left and right inputs unlinked for the same node. Building on the > extension above to allow unlinking to work with a shared knowledge base the > initial linking status value can be set to both left and right being > unlinked. However in this initial state, where both sides are unlinked, the > leaf node's right input isn't just waiting for a left propagation so the > right can re-link itself (which it can't as the left is unlinked too). It's > also waiting to receive it's first propagation, when it does it will link > the left input back in. This will then tell it's parent node's right input > to also do the same, i.e. wait for it's first right input propagation and > link in the left when it happens. If it already has a right propagation > it'll just link in the left anyway. This will trickle up until the root is > finally linked in and propagations can happen as normally, and the rule's > nodes return to the above heuristics for when to link and unlink the nodes. > > Avoid Unnecessary Eager Propagations > > A rule always eagerly propagates all joins, regardless of whether the child > node can undertake joins too, for instance of there is no propagates for the > leaf node then no rules can fire, and the eager propagations are wasted > work. Unlinking can be extended to try to prevent some level of eager > propagations. Should the leaf node become right unlinked and that right > input also become empty it will unlink the left too (so both sides are > unlinked) and go back to waiting for the first right propagation, at which > point it'll re-link the left. If the parent node also has it's right input > unlinked at the point that it's child node unlinks the left it will do this > too. It will repeat this up the chain until it reaches a node that has both > left and right linked in. This stops any further eager matching from > occurring that we know can't result in an activation until the leaf node has > at least one right input. > > Heuristics to Avoid Churn from Excessive and Unnecessary Unlinking > > The only case where left and right linking would be a bad idea is in > situations that would cause a "churn”. Churn is when a node with have a > large amount of right input memory is continually caused to be linked in and > linked out, forcing those nodes to be repeatedly populated which causes a > slow down. However heuristics can be used here too, to avoid unnecessary > unlinking. The first time an input becomes empty unlink the opposite and > store a time stamp (integer counter for fact handles from the WM). Then have > a minimum delta number, say 100. The next time it attempts to unlink, > calculate the delta of the current time stamp (integer counter on fact > handle) and the time stamp of the node which last unlinked (which was > recorded at the point of unlinking) if it's less than 100 then do nothing > and don't unlink until it's 100 or more. If it's 100 or more then unlink and > as well as storing the unlink time stamp, then take the delta of 100 or more > and apply a multiple (2, 3, 4 etc depending on how steep you want it to > rise, 3 is a good starting number) and store it. Such as if the delta is 100 > then store 300. The next time the node links and attempts to unlink it must > be a delta of 300 or more, the time after that 900 the time after that 2700. > > > _______________________________________________ > rules-dev mailing > [email protected]https://lists.jboss.org/mailman/listinfo/rules-dev > > > > _______________________________________________ > rules-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/rules-dev > >
_______________________________________________ rules-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/rules-dev
