Re: Marking cosnecutive tokens with RUTA

Diego Buoro Wed, 03 Jun 2015 10:30:14 -0700

Hi Peter, the example we used is the small sentence inside a string at the
end of UIMAChecker.java: "Refiro-me à trabalho remunerado.".
Based on the Main.ruta we sent you, we expected the output to contain 7
"PROBLEM" annotations. This part is working.
The problem is when we change the last line of Main.ruta from
"cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case we
expected 6 "PROBLEM" annotations: the same ones we had on the first
example, excpect for the first one.That's what happens when you run the
script on a simple Ruta project, but when we run it in the  Java
application we get 0 "PROBLEM" annotations.
We think this difference is happening because in the Ruta project we don't
use a simple text as input.Instead, we feed it a preprocessed xmi file. On
the other hand on the Java application, we do the processing ourselves via
the processCas method. It's possible that the processCas method is creating
tokens in a way that prevents us from detecting when one is next to the
other on the Ruta script.
We are sending you the xmi file to use as an example for a simple Ruta
project. If there are any other examples you'd like us to send you, just
say the word :D


Best,

Diego

2015-06-01 11:15 GMT-03:00 Diego Buoro <[email protected]>:

> Sorry,please disregard my last answer. The idea wasn't to use the xmi, we
> are still thinking in a minimal example to provide to you.
> We will send you in the next few days.
>
> 2015-06-01 10:37 GMT-03:00 Diego Buoro <[email protected]>:
>
>> Hi Peter,how are you doing?
>>
>> We were trying to run using the files such as Crase01.xmi and
>> rule_xml_001.xmi.
>> Our goal is trying to run those two more simpler first,and then run with
>> Crase.xmi.
>>
>> About the package declaration, i still need to check what ruta version is.
>> I will be checking this soon.
>>
>> All Best,
>>
>> Diego
>>
>>
>>
>>
>>
>> 2015-05-30 0:45 GMT-03:00 Diego Buoro <[email protected]>:
>>
>>> Hi Peter!
>>> No problem, I appreciate your support.
>>>
>>> All Best,
>>>
>>> Diego
>>>
>>> 2015-05-27 14:22 GMT-03:00 Diego Buoro <[email protected]>:
>>>
>>>> Hi Peter!
>>>> We call the script with the following lines:
>>>>
>>>>  URL url = Resources.getResource("Main.ruta");
>>>> String text = Resources.toString(url, Charsets.UTF_8);
>>>>  AnalysisEngineDescription aeDes =
>>>> Ruta.createAnalysisEngineDescription(text, tsd);
>>>> this.ae = UIMAFramework.produceAnalysisEngine(aeDes);
>>>>
>>>> CAS cas = ae.newCAS();
>>>> converter.populateCas(sentence.getTextSentence(), cas);
>>>>  ae.process(cas);
>>>>
>>>> The populateCAS method is responsible for translating our annotations
>>>> into RUTA annotations, but it doesn't set any type priority explicitly.
>>>> We don't know much about type priorities, the RUTA references we found
>>>> say very little about that.Are they necessary for doing what we need?
>>>>
>>>> The file that contains the above lines is available here:
>>>>
>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java
>>>> The processCAS mehtod is available here:
>>>>
>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java
>>>> The script we are calling is available here:
>>>>
>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta
>>>>
>>>> PS:Yes, We remembered the semicolons.
>>>>
>>>> Thanks for the help :)
>>>>
>>>>
>>>>
>>>> 2015-05-26 15:30 GMT-03:00 Diego Buoro <[email protected]>:
>>>>
>>>>> I think i wasn't clear enough, and i should be more specific.
>>>>>
>>>>> I have a type system in which all words have been annotated as Tokens.
>>>>> I am calling a RUTA script from a java class, and that script has only one
>>>>> rule:
>>>>> Token Token {-> Problem}
>>>>>
>>>>> However, with this script, no Problems are created. When I try
>>>>> Token {-> Problem}
>>>>>
>>>>> I get one problem for each Token, which is what I expected. Why can't
>>>>> I create annotations using rules with more than one word?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2015-05-26 14:49 GMT-03:00 Diego Buoro <[email protected]>:
>>>>>
>>>>>> Hello guys,how are you doing?
>>>>>>
>>>>>> I would like to know once i have called RUTA from a Java project, how
>>>>>> can i mark consecutive tokens as a "Problem" (the name of my annotation, 
>>>>>> in
>>>>>> this case)?
>>>>>>
>>>>>> Thanks in advice!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

<?xml version="1.0" encoding="UTF-8"?><xmi:XMI xmlns:tcas="http:///uima/tcas.ecore"; xmlns:xmi="http://www.omg.org/XMI"; xmlns:cas="http:///uima/cas.ecore"; xmlns:uima="http:///opennlp/uima.ecore"; xmi:version="2.0"><cas:NULL xmi:id="0"/><cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="Refiro-me à trabalho remunerado."/><tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="32" language="pt-br"/><uima:Sentence xmi:id="13" sofa="1" begin="0" end="32"/><uima:Token xmi:id="17" sofa="1" begin="0" end="6" pos="prop" features="M=S" lexeme="Refiro"/><uima:Token xmi:id="27" sofa="1" begin="6" end="9" pos="pron-pers" features="M=1S=ACC" lexeme="-me"/><uima:Token xmi:id="37" sofa="1" begin="10" end="11" pos="prp" features="-" lexeme="a"><lemma>a</lemma></uima:Token><uima:Token xmi:id="48" sofa="1" begin="10" end="11" pos="art" features="F=S" lexeme="a"><lemma>o</lemma></uima:Token><uima:Token xmi:id="59" sofa="1" begin="12" end="20" pos="n" features="M=S" lexeme="trabalho"><lemma>trabalho</lemma></uima:Token><uima:Token xmi:id="70" sofa="1" begin="21" end="31" pos="v-pcp" features="M=S" lexeme="remunerado"><lemma>remunerar</lemma></uima:Token><uima:Token xmi:id="81" sofa="1" begin="31" end="32" pos="." features="-" lexeme="."><lemma>.</lemma></uima:Token><uima:Chunk xmi:id="92" sofa="1" begin="0" end="6" chunkType="NP" head="17"/><uima:Chunk xmi:id="98" sofa="1" begin="6" end="9" chunkType="NP" head="27"/><uima:Chunk xmi:id="104" sofa="1" begin="10" end="11" chunkType="PP" head="37"/><uima:Chunk xmi:id="110" sofa="1" begin="10" end="31" chunkType="NP" head="59"/><cas:View sofa="1" members="8 13 17 27 37 48 59 70 81 92 98 104 110"/></xmi:XMI>

Re: Marking cosnecutive tokens with RUTA

Reply via email to