Hi Wolfgang,

I finally decided to test different implementations:

   * first based on an accumulation function
   * second (your suggestion) relying on drools to 1) build all
     SentenceWindows then to 2) locate ManualAnnotations inside those
     Windows
   * third (your suggestion as well) relying on drools to 1) build only
     SentenceWindows that might be interesting (containing one of the
     ManualAnnotations I am looking for) then to 2) locate
     ManualAnnotations inside those Windows

First implementation performs quite well but I am stuck on the parametrization (I need to to define build2windows, build3windows, build4windows... functions): 47 milliseconds on 100 sentences, 94 ms sentences with 1000 sentences Second implementations is of course sub optimal since it creates many useless windows: 125 ms on 100 sentences, 14400 ms on 1000 sentences Third implementation is very versatile and its performances are comparable to the accumulator solution: 93 ms on 100 sentences, 125 ms on 1000 sentences

So thanks again for your suggestion; it was definitely useful :-).

Regards,

Bruno.

Le 19/08/2011 17:25, Wolfgang Laun a écrit :
2011/8/19 Bruno Freudensprung <[email protected] <mailto:[email protected]>>


    I am not sure I understand what you mean by "random order" but I
    guess it has to do with my ArrayList result type.
    What I had in mind is to put all sentences in a TreeSet during the
    "action" method, and finally issue an ArrayList result object by
    iterating over the TreeSet and grouping sentences.


Heh :) I clean forgot that I had done this sort of thing not too long ago.

    My first guess was that such an accumulator might be faster than a
    construction of windows using rules.
    However I admit your suggestion is very elegant, and I thank you
    for that! I am probably still too imperative-minded...


Well, a procedural solution would be a reasonable alternative for this problem.

-W


    Regards,

    Bruno.

    Le 19/08/2011 16:05, Wolfgang Laun a écrit :
    How would you write "buildwindows", given that its "action"
    method would be called once for each Sentence, in random order?

    It's very simple to write a very small set of rules to construct
    all SentenceWindow facts of size 1 and then to extend them to any
    desired size, depending on some parameter.
    1. Given a Sentence and no Window beginning with it, create a
    Window of length 1.
    2. Given a Window of size n < desiredSize and given a Sentence
    immediately following it, extend the Window to one of size n+1.
    3a. For any Window of desiredSize, inspect it for "closely
    situated ManualAnnotations".
    3b. If ManualAnnotations have been associated with their
    containing Sentences up-front, you just need to find Windows with
    more than 1 ManualAnnotation, adding them in the RHS of rule 2 above.

    -W


    2011/8/19 Bruno Freudensprung <[email protected]
    <mailto:[email protected]>>


        Hi Wolfgang,

        Thanks for your answer.
        Sentences are not contiguous (might be some space characters
        in between) but manual annotations cannot overlap sentences
        (interpret "overlap" in terms of Drools Fusion terminology).
        If I had an "inside" operator, do you think the following
        accumulate option could be better?

        when
        *$result : ArrayList() from accumulate ( $s: Sentence(),
        buildwindows($s))*
        *$w : SentenceWindows () **from $result*
            a1 : ManualAnnotation (this *inside *$w)
            a2 : ManualAnnotation (this != a1, this *inside *$w)
        then
            ... do something with a1 and a2 since they are "close" to
        each other
        end

        Does anyone know something about accumulator parametrization
        (looking at the source code it does not seem to be possible,
        though)?
        Maybe a syntax inspired of operator parametrization could be
        nice:

            $result : ArrayList() from accumulate ( $s: Sentence(),
        *buildwindows[3]($s)*)

        Best regards,

        Bruno.

        Le 19/08/2011 13:55, Wolfgang Laun a écrit :
        There are some details that one should consider before
        deciding on a particular implementation technique.

            * Are all Sentences contiguous, i.e., s1.end = pred(
              s2.start )
            * Can a ManualAnnotation start on one Sentence and end
              in the next or any further successor?

        As in all problems where constraints depend on an order
        between facts, performance is going to be a problem with
        increasing numbers of Sentences and ManualAnnotations.

        Your accumulate plan could be a very inefficient approach.
        Creating O(N*N) pairs and then looking for an overlapping
        window is much worse than looking at each window, for
        instance. But it depends on the expected numbers for both.

        -W



        2011/8/19 Bruno Freudensprung <[email protected]
        <mailto:[email protected]>>

            Hello,

            I am trying to implement rules handling "Sentence",
            "ManualAnnotation" objects (imagine someone highligthing
            words of the document). Basically "Sentence" objects
            have "start" and "end" positions (fields) into the text
            of a document, and they are Comparable according to
            their location into the document.

            I need to write rules using the notion "window of
            consecutive sentences".

            Basically I am not very interested by those
            "SentenceWindow" objects, I just need them to define a
            kind of proximity between "ManualAnnotation" objects.
            What I eventually need in the "when" of my rule is
            something like:

            when
                ... maybe something creating the windows
                a1 : ManualAnnotation ()
                a2 : ManualAnnotation (this != a1)
                SentenceWindow (this includes a1, this includes a2)
            then
                ... do something with a1 and a2 since they are
            "close" to each other
            end

            As I don't know the "internals" of Drools, I would like
            to have your opinion about what the best "idiom":

                * create all SentenceWindow objects and insert them
                  in the working memory, then write rules against
                  all the facts (SentenceWindow and ManualAnnotation)
                * implement an accumulator that will create a list
                  of  SentenceWindow object


            The first option could look like:

            |||rule "Create sentence windows"
               when
                  # find 3 consecutive sentences
                  s1 : Sentence()
                  s2 : Sentence(this > s1)
                  s3 : Sentence(this > s2)
                  not Sentence(this != s2 && > s1 && < s3)
               then
                  SentenceWindow swindow = new SentenceWindow();
                  swindow.setStart(s1.getStart());
                  swindow.setTheend(s3.getEnd());
                  insert(swindow);
            end|

            ... Then use the first rule "as is".

            The accumulator option could look like (I am not really
            sure the syntax is correct) :

            when
            *$result : ArrayList() from accumulate ( $s: Sentence(),
            buildwindows($s))*
                a1 : ManualAnnotation ()
                a2 : ManualAnnotation (this != a1)
            *SentenceWindows (this includes a1, this includes a2)
            **from $result*
            then
                ... do something with a1 and a2 since they are
            "close" to each other
            end

            Is it possible to decide if one way is best than the other?

            And one last question: it is possible to "parametrize"
            an accumulator (in order to provide the number of
            sentences that should be put in the windows)?
            I mean something like:

            when
                $result : ArrayList() from accumulate ( $s:
            Sentence(), *buildwindows(3,* $s))


            Thanks in advance for you insights,

            Best regards,

            Bruno.

            _______________________________________________
            rules-users mailing list
            [email protected]
            <mailto:[email protected]>
            https://lists.jboss.org/mailman/listinfo/rules-users



        _______________________________________________
        rules-users mailing list
        [email protected]  <mailto:[email protected]>
        https://lists.jboss.org/mailman/listinfo/rules-users


        _______________________________________________
        rules-users mailing list
        [email protected] <mailto:[email protected]>
        https://lists.jboss.org/mailman/listinfo/rules-users



    _______________________________________________
    rules-users mailing list
    [email protected]  <mailto:[email protected]>
    https://lists.jboss.org/mailman/listinfo/rules-users


    _______________________________________________
    rules-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.jboss.org/mailman/listinfo/rules-users



_______________________________________________
rules-users mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/rules-users

_______________________________________________
rules-users mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/rules-users

Reply via email to