Re: Marking cosnecutive tokens with RUTA

Peter Klügl Wed, 10 Jun 2015 12:30:28 -0700

Hi,

here are the results of my investigations:

- the text of the document is not set directly. You should add somethinglike cas.setDocumentText(sentence.getDocumentText()); before populatingthe CAS in your method. Otherwise there will be a DocumentAnnotation oflength 0. Ruta does not like these... that's the source of the problem.If you add the line, or avoid size length annotations somehow, then therules should work just fine.

- I'd rather use tcas.addFsToIndexes(sentenceAnn); instead oftcas.getIndexRepository().addFS(sentenceAnn); (but that shouldn't changeanything)

- You access the problem type "cogroo.ruta.Base.PROBLEM", but the rulesseem to use the type "Main.PROBLEM"


Best,

Peter


Am 03.06.2015 um 19:14 schrieb Diego Buoro:

Hi Peter, the example we used is the small sentence inside a string atthe end of UIMAChecker.java: "Refiro-me à trabalho remunerado.".Based on the Main.ruta we sent you, we expected the output to contain7 "PROBLEM" annotations. This part is working.The problem is when we change the last line of Main.ruta from"cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case weexpected 6 "PROBLEM" annotations: the same ones we had on the firstexample, excpect for the first one.That's what happens when you runthe script on a simple Ruta project, but when we run it in the Javaapplication we get 0 "PROBLEM" annotations.We think this difference is happening because in the Ruta project wedon't use a simple text as input.Instead, we feed it a preprocessedxmi file. On the other hand on the Java application, we do theprocessing ourselves via the processCas method. It's possible that theprocessCas method is creating tokens in a way that prevents us fromdetecting when one is next to the other on the Ruta script.We are sending you the xmi file to use as an example for a simple Rutaproject. If there are any other examples you'd like us to send you,just say the word :D


Best,

Diego

2015-06-01 11:15 GMT-03:00 Diego Buoro <[email protected]<mailto:[email protected]>>:


    Sorry,please disregard my last answer. The idea wasn't to use the
    xmi, we are still thinking in a minimal example to provide to you.
    We will send you in the next few days.

    2015-06-01 10:37 GMT-03:00 Diego Buoro <[email protected]
    <mailto:[email protected]>>:

        Hi Peter,how are you doing?

We were trying to run using the files such as Crase01.xmi andrule_xml_001.xmi.

        Our goal is trying to run those two more simpler first,and
        then run with Crase.xmi.

        About the package declaration, i still need to check what ruta
        version is.
        I will be checking this soon.

        All Best,

        Diego





        2015-05-30 0:45 GMT-03:00 Diego Buoro <[email protected]
        <mailto:[email protected]>>:

            Hi Peter!
            No problem, I appreciate your support.

            All Best,

            Diego

            2015-05-27 14:22 GMT-03:00 Diego Buoro <[email protected]
            <mailto:[email protected]>>:

                Hi Peter!
                We call the script with the following lines:

                 URL url = Resources.getResource("Main.ruta");
                String text = Resources.toString(url, Charsets.UTF_8);
                 AnalysisEngineDescription aeDes =
                Ruta.createAnalysisEngineDescription(text, tsd);
                this.ae <http://this.ae> =
                UIMAFramework.produceAnalysisEngine(aeDes);

                CAS cas = ae.newCAS();
                converter.populateCas(sentence.getTextSentence(), cas);
                 ae.process(cas);

                The populateCAS method is responsible for translating
                our annotations into RUTA annotations, but it doesn't
                set any type priority explicitly.
                We don't know much about type priorities, the RUTA
                references we found say very little about that.Are
                they necessary for doing what we need?

                The file that contains the above lines is available here:
                
https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java
                The processCAS mehtod is available here:
                
https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java
                The script we are calling is available here:
                
https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta

                PS:Yes, We remembered the semicolons.

                Thanks for the help :)



                2015-05-26 15:30 GMT-03:00 Diego Buoro
                <[email protected] <mailto:[email protected]>>:

                    I think i wasn't clear enough, and i should be
                    more specific.

                    I have a type system in which all words have been
                    annotated as Tokens. I am calling a RUTA script
                    from a java class, and that script has only one rule:
                    Token Token {-> Problem}

                    However, with this script, no Problems are
                    created. When I try
                    Token {-> Problem}

                    I get one problem for each Token, which is what I
                    expected. Why can't I create annotations using
                    rules with more than one word?

                    Thanks




                    2015-05-26 14:49 GMT-03:00 Diego Buoro
                    <[email protected] <mailto:[email protected]>>:

                        Hello guys,how are you doing?

                        I would like to know once i have called RUTA
                        from a Java project, how can i mark
                        consecutive tokens as a "Problem" (the name of
                        my annotation, in this case)?

                        Thanks in advice!

Re: Marking cosnecutive tokens with RUTA

Reply via email to