Re: [rules-users] Improving Drools Memory Performance

Mark Proctor Wed, 21 Jul 2010 19:47:37 -0700

 On 22/07/2010 03:28, Jevon Wright wrote:

Hi Mark and Wolfgang,
Thank you for your replies! Comments below.
A bit of background: I am using Drools to take a given EMF modelinstance, and insert new EObjects into the instance, according to thegiven rules. I try to perform inference top-down, so there is morethan one iteration of insertion - as objects are inserted, the rulesneed to be re-evaluated. If I understand correctly, this means that Ican't use a stateless session or the sequential option, because theworking memory is changing with inserted facts.
The rules don't appear to insert directly, because I insert newobjects into a queue instead (queue.add(object, drools)) - once ruleevaluation is complete, I insert the contents of the queue into theexisting working memory and fire all the rules again. I try to preventthe rules modifying the working memory directly. This is also why allthe rules are of the format (x, ..., y, not z => insert z).
This approach has a number of benefits. It finds inconsistencies inthe rules and means rules have no order, because inserted facts don'teffect the working memory immediately. It also allows me to detectinfinite loops, without restricting the number of times a rule canfire. This was described in our 2010 paper [1].
I don't think my implementation of this approach is causing the memoryproblem, but I could be wrong.
    detail : DetailWire ( (from == source && to == target) || (from ==
    target && to == source) )
    The above is turned effectively into an MVEL statement, you might
    get better performance with a ConditionalElement 'or' as lont as the
    two are mutually exclusive:

     ( DetailWire (from == source, to == target ) or
       DetailWire (from == target, to == source) )
I thought this was the case. However in this case, you can't bind thevariable "detail" (the Drools compiler won't accept the syntax), isthis correct? I think one solution is to split the rule into twoseparate rules for each "or" part (thus a DSL) - I don't want to haveto expand these rules by hand.

( $d : DetailWire (from == source, to == target ) or
   $d : DetailWire (from == target, to == source) )

is valid


    And then i'm not sure what it is you are doing in the second two
    rules, but it looks wrong.
    text : InputTextField ( eContainer == form, eval
    (functions.getAutocompleteInputName(attribute).equals(name)) )
    onInput : EventTrigger ( text.onInput == onInput )
    currentInput : Property ( text.currentInput == currentInput )

The point of this rule is to select something like the following (froman EMF instance):


<child name="form">
<child xsi:type="InputTextField" name="...">
<onInput xsi:type="EventTrigger" ... />
<currentInput xsi:type="Property" ... />
<events xsi:type="EventTrigger" ... />
<properties xsi:type="Property" ... />
</child>
</child>

IEventTrigger ( text.onInput == onInput ) - is the text here the boundvariable text, or a field on EventTrigger? the logic isn't very clear.What you have written in java would look like

eventTrigger.getText().getOnInput().equals( eventTrigger)

Is that what you where expecting? that's why we often use the $ prefixto differentiate fields from variables.

I can't use use 'eContainer', because 'text' can also containEventTriggers in 'text.events'. These bound variables are thensupposed to be used later within the rule, either to select othervariables, or as part of the created element.

I am going to try and remove unused bound variables, though. I think Iwill try and write a script to analyse the exported XML for the rulesto analyse automatically (I have 264 rules written by hand).


Thanks
Jevon

[1]: J. Wright and J. Dietrich, "Non-Montonic Model Completion in WebApplication Engineering," in Proceedings of the 21st AustralianSoftware Engineering Conference (ASWEC 2010)<http://aswec2010.massey.ac.nz/>, Auckland, New Zealand, 2010.http://openiaml.org/#completion

2010/7/16 Mark Proctor <mproc...@codehaus.org<mailto:mproc...@codehaus.org>>


    detail : DetailWire ( (from == source&&  to == target) || (from == target&& 
 to == source) )
    The above is turned effectively into an MVEL statement, you might get 
better performance with a ConditionalElement 'or' as lont as the
    two are mutually exclusive:

      ( DetailWire (from == source, to == target ) or
        DetailWire (from == target, to == source) )

    I saw you did this:
    not ( form : InputForm ( eContainer == container, name ==iterator.name  
<http://iterator.name>  ) )

    The 'form' is not accessible outside the 'not', and that rule does not need 
it.

    Is this not a bug. You bind "text". And then i'm not sure what it is you 
are doing in the second two rules, but it looks wrong.
    text : InputTextField ( eContainer == form, eval 
(functions.getAutocompleteInputName(attribute).equals(name)) )
    onInput : EventTrigger ( text.onInput == onInput
    currentInput : Property ( text.currentInput == currentInput )

    It doesn't look like you are updating the session with facts, i.e. it's a 
stateless session. See if this helps

    KnowledgeBaseConfiguration kconf = 
KnowledgeBaseFactory.newKnowledgeBaseConfiguration();
    kconf.setOption( SequentialOption.YES );

    KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase( kconf );
    final StatelessKnowledgeSession ksession = 
kbase.newStatelessKnowledgeSession();
    ksession.execute(....);

    In the execute you can provie it with a batch of commands to execute, or 
just a list of objects, up to you. see stateless session for
    more details.

    The SequentialOption may help memory, a small mount, if you aren't doing 
any working memory modifications (insert/modify/update/retract).

    Mark


    On 16/07/2010 04:16, Jevon Wright wrote:

    Hi again,

    By removing all of the simple eval()s from my rules, I have cut
    heap usage by at least an order of magnitude. However this still
    isn't enough.

    Since I am trying to reduce the cross-product size (as in SQL), I
    recall that most SQL implementations have a "DESCRIBE SELECT"
    query which provides real-time information about the complexity
    of a given SQL query - i.e. the size of the tables, indexes used,
    and so on. Is there any such tool available for Drools? Are there
    any tools which can provide clues as to which rules are using the
    most memory?

    Alternatively, I am wondering what kind of benefit I could expect
    from using materialized views to create summary tables; that is,
    deriving and inserting additional facts. This would allow Drools
    to rewrite queries that currently use eval(), but would increase
    the size of working memory, so would this actually save heap size?

    To what extent does Drools rewrite queries? Is there any
    documentation describing the approaches used?

    Any other ideas on how to reduce heap memory usage? I'd
    appreciate any ideas :)

    Thanks
    Jevon


    On Mon, Jul 12, 2010 at 5:56 PM, Jevon Wright <je...@jevon.org
    <mailto:je...@jevon.org>> wrote:

        Hi Wolfgang and Mark,

        Thank you for your replies! You were correct: my eval() functions
        could generally be rewritten into Drools directly.

        I had one function "connectsDetail" that was constraining
        unidirectional edges, and could be rewritten from:
         detail : DetailWire ( )
         eval ( functions.connectsDetail(detail, source, target) )

        to:
         detail : DetailWire ( from == source, to == target )

        Another function, "connects", was constraining bidirectional
        edges,
        and could be rewritten from:
         sync : SyncWire( )
         eval ( functions.connects(sync, source, target) )

        to:
         sync : SyncWire( (from == source && to == target) || (from
        == target
        && to == source) )

        Finally, the "veto" function could be rewritten from:
         detail : DetailWire ( )
         eval ( handler.veto(detail) )

        to:
         detail : DetailWire ( overridden == false )

        I took each of these three changes, and evaluated them
        separately [1].
        I found that:

        1. Inlining 'connectsDetail' made a huge difference - 10-30%
        faster
        execution and 50-60% less allocated heap.
        2. Inlining 'connects' made very little difference - 10-30%
        faster
        execution, but 0-20% more allocated heap.
        3. Inlining 'veto' made no difference - no significant change in
        execution speed or allocated heap.

        I think I understand why inlining 'connects' would improve
        heap usage
        - because the rules essentially have more conditionals?

        I also understand why 'veto' made no difference - for most of
        my test
        models, "overridden" was never true, so adding this
        conditional was
        not making the cross product set any smaller.

        Finally, I also tested simply joining all of the rules
        together into
        one file. This happily made no difference at all (although
        made it
        more difficult to edit).

        So I think I can safely conclude that eval() should be used
        as little
        as possible - however, this means that the final rules are
        made more
        complicated and less human-readable, so a DSL may be best for my
        common rule patterns in the future.

        Thanks again!
        Jevon

        [1]:
        http://www.jevon.org/wiki/Improving_Drools_Memory_Performance

        On Sat, Jul 10, 2010 at 12:28 AM, Wolfgang Laun
        <wolfgang.l...@gmail.com <mailto:wolfgang.l...@gmail.com>> wrote:
        > On 9 July 2010 14:14, Mark Proctor <mproc...@codehaus.org
        <mailto:mproc...@codehaus.org>> wrote:
        >>  You have many objects there that are not constrained;
        >
        > I have an inkling that the functions.*() are hiding just
        these contraints,
        > It's certainly the wrong way, starting with oodles of node
        pairs, just to
        > pick out connected ones by fishing for the connecting edge.
        And this
        > is worsened by trying to find two such pairs which meet at some
        > DomainSource
        >
        > Guesswork, hopefully educated ;-)
        >
        > -W
        >
        >
        >> if there are
        >> multiple versions of those objects you are going to get
        massive amounts
        >> of cross products. Think in terms of SQL, each pattern you
        add is like
        >> an SQL join.
        >>
        >> Mark
        >> On 09/07/2010 09:20, Jevon Wright wrote:
        >>> Hi everyone,
        >>>
        >>> I am working on what appears to be a fairly complex rule
        base based on
        >>> EMF. The rules aren't operating over a huge number of
        facts (less than
        >>> 10,000 EObjects) and there aren't too many rules (less
        than 300), but
        >>> I am having a problem with running out of Java heap space
        (set at ~400
        >>> MB).
        >>>
        >>> Through investigation, I came to the conclusion that this
        is due to
        >>> the design of the rules, rather than the number of facts.
        The engine
        >>> uses less memory inserting many facts that use simple
        rules, compared
        >>> with inserting few facts that use many rules.
        >>>
        >>> Can anybody suggest some tips for reducing heap memory
        usage in
        >>> Drools? I don't have a time constraint, only a
        heap/memory constraint.
        >>> A sample rule in my project looks like this:
        >>>
        >>>    rule "Create QueryParameter for target container of
        DetailWire"
        >>>      when
        >>>        container : Frame( )
        >>>        schema : DomainSchema ( )
        >>>        domainSource : DomainSource ( )
        >>>        instance : DomainIterator( )
        >>>        selectEdge : SelectEdge ( eval (
        >>> functions.connectsSelect(selectEdge, instance,
        domainSource )) )
        >>>        schemaEdge : SchemaEdge ( eval (
        >>> functions.connectsSchema(schemaEdge, domainSource, schema
        )) )
        >>>        source : VisibleThing ( eContainer == container )
        >>>        target : Frame ( )
        >>>        instanceSet : SetWire (
        eval(functions.connectsSet(instanceSet,
        >>> instance, source )) )
        >>>        detail : DetailWire ( )
        >>>        eval ( functions.connectsDetail(detail, source,
        target ))
        >>>        pk : DomainAttribute ( eContainer == schema,
        primaryKey == true )
        >>>        not ( queryPk : QueryParameter ( eContainer ==
        target, name == pk.name <http://pk.name> ) )
        >>>        eval ( handler.veto( detail ))
        >>>
        >>>      then
        >>>        QueryParameter qp =
        handler.generatedQueryParameter(detail, target);
        >>>        handler.setName(qp, pk.getName());
        >>>        queue.add(qp, drools); // wraps insert(...)
        >>>
        >>>    end
        >>>
        >>> I try to order the select statements in an order that
        will reduce the
        >>> size of the cross-product (in theory), but I also try and
        keep the
        >>> rules fairly human readable. I try to avoid comparison
        operators like
        >>> <  and>. Analysing a heap dump shows that most of the
        memory is being
        >>> used in StatefulSession.nodeMemories>  PrimitiveLongMap.
        >>>
        >>> I am using a StatefulSession; if I understand correctly,
        I can't use a
        >>> StatelessSession with sequential mode since I am
        inserting facts as
        >>> part of the rules. If I also understand correctly, I'd
        like the Rete
        >>> graph to be tall, rather than wide.
        >>>
        >>> Some ideas I have thought of include the following:
        >>> 1. Creating a separate intermediary meta-model to split
        up the sizes
        >>> of the rules. e.g. instead of (if A and B and C then
        insert D), using
        >>> (if A and B then insert E; if E and C then insert D).
        >>> 2. Moving eval() statements directly into the Type(...)
        selectors.
        >>> 3. Removing eval() statements. Would this allow for
        better indexing by
        >>> the Rete algorithm?
        >>> 4. Reducing the height, or the width, of the class
        hierarchy of the
        >>> facts. e.g. Removing interfaces or abstract classes to
        reduce the
        >>> possible matches. Would this make a difference?
        >>> 5. Conversely, increasing the height, or the width, of
        the class
        >>> hierarchy. e.g. Adding interfaces or abstract classes to
        reduce field
        >>> accessors.
        >>> 6. Instead of using EObject.eContainer, creating an explicit
        >>> containment property in all of my EObjects.
        >>> 7. Creating a DSL that is human-readable, but allows for the
        >>> automation of some of these approaches.
        >>> 8. Moving all rules into one rule file, or splitting up
        rules into
        >>> smaller files.
        >>>
        >>> Is there kind of profiler for Drools that will let me see
        the size (or
        >>> the memory usage) of particular rules, or of the memory
        used after
        >>> inference? Ideally I'd use this to profile any changes.
        >>>
        >>> Thanks for any thoughts or tips! :-)
        >>>
        >>> Jevon
        >>> _______________________________________________
        >>> rules-users mailing list
        >>> rules-users@lists.jboss.org
        <mailto:rules-users@lists.jboss.org>
        >>> https://lists.jboss.org/mailman/listinfo/rules-users
        >>>
        >>>
        >>
        >>
        >> _______________________________________________
        >> rules-users mailing list
        >> rules-users@lists.jboss.org
        <mailto:rules-users@lists.jboss.org>
        >> https://lists.jboss.org/mailman/listinfo/rules-users
        >>
        >
        > _______________________________________________
        > rules-users mailing list
        > rules-users@lists.jboss.org
        <mailto:rules-users@lists.jboss.org>
        > https://lists.jboss.org/mailman/listinfo/rules-users
        >



    _______________________________________________
    rules-users mailing list
    rules-users@lists.jboss.org  <mailto:rules-users@lists.jboss.org>
    https://lists.jboss.org/mailman/listinfo/rules-users



    _______________________________________________
    rules-users mailing list
    rules-users@lists.jboss.org <mailto:rules-users@lists.jboss.org>
    https://lists.jboss.org/mailman/listinfo/rules-users



_______________________________________________
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users

_______________________________________________
rules-users mailing list
rules-users@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users

Re: [rules-users] Improving Drools Memory Performance

Reply via email to