I assumed that these zero-length annotations do not cause problems anymore... I was wrong and I should do something about it. Either they will really be ignored completely now or I need to change the sequential matching so that they will be consumed somehow. If anyone is interested I would explain the problems and indications in more detail in a new jira issue.
Best, Peter Am 11.06.2015 um 08:38 schrieb [email protected]: > Hi, > > yeah, that once hit me, too. It has something to do with the internal sorting > of annotations with the same start offset. I annotated some meta data for the > whole document in an annotation with start offset 0 and end offset 0. That's > not good. The end offset must be the length of the document text. It's fine > then. > > Cheers, > Armin > > -----Ursprüngliche Nachricht----- > Von: Peter Klügl [mailto:[email protected]] > Gesendet: Mittwoch, 10. Juni 2015 21:28 > An: [email protected] > Betreff: Re: Marking cosnecutive tokens with RUTA > > Hi, > > here are the results of my investigations: > > - the text of the document is not set directly. You should add something > like cas.setDocumentText(sentence.getDocumentText()); before populating > the CAS in your method. Otherwise there will be a DocumentAnnotation of > length 0. Ruta does not like these... that's the source of the problem. > If you add the line, or avoid size length annotations somehow, then the > rules should work just fine. > > - I'd rather use tcas.addFsToIndexes(sentenceAnn); instead of > tcas.getIndexRepository().addFS(sentenceAnn); (but that shouldn't change > anything) > > - You access the problem type "cogroo.ruta.Base.PROBLEM", but the rules > seem to use the type "Main.PROBLEM" > > Best, > > Peter > > > Am 03.06.2015 um 19:14 schrieb Diego Buoro: >> Hi Peter, the example we used is the small sentence inside a string at >> the end of UIMAChecker.java: "Refiro-me à trabalho remunerado.". >> Based on the Main.ruta we sent you, we expected the output to contain >> 7 "PROBLEM" annotations. This part is working. >> The problem is when we change the last line of Main.ruta from >> "cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case we >> expected 6 "PROBLEM" annotations: the same ones we had on the first >> example, excpect for the first one.That's what happens when you run >> the script on a simple Ruta project, but when we run it in the Java >> application we get 0 "PROBLEM" annotations. >> We think this difference is happening because in the Ruta project we >> don't use a simple text as input.Instead, we feed it a preprocessed >> xmi file. On the other hand on the Java application, we do the >> processing ourselves via the processCas method. It's possible that the >> processCas method is creating tokens in a way that prevents us from >> detecting when one is next to the other on the Ruta script. >> We are sending you the xmi file to use as an example for a simple Ruta >> project. If there are any other examples you'd like us to send you, >> just say the word :D >> >> Best, >> >> Diego >> >> 2015-06-01 11:15 GMT-03:00 Diego Buoro <[email protected] >> <mailto:[email protected]>>: >> >> Sorry,please disregard my last answer. The idea wasn't to use the >> xmi, we are still thinking in a minimal example to provide to you. >> We will send you in the next few days. >> >> 2015-06-01 10:37 GMT-03:00 Diego Buoro <[email protected] >> <mailto:[email protected]>>: >> >> Hi Peter,how are you doing? >> >> We were trying to run using the files such as Crase01.xmi and >> rule_xml_001.xmi. >> Our goal is trying to run those two more simpler first,and >> then run with Crase.xmi. >> >> About the package declaration, i still need to check what ruta >> version is. >> I will be checking this soon. >> >> All Best, >> >> Diego >> >> >> >> >> >> 2015-05-30 0:45 GMT-03:00 Diego Buoro <[email protected] >> <mailto:[email protected]>>: >> >> Hi Peter! >> No problem, I appreciate your support. >> >> All Best, >> >> Diego >> >> 2015-05-27 14:22 GMT-03:00 Diego Buoro <[email protected] >> <mailto:[email protected]>>: >> >> Hi Peter! >> We call the script with the following lines: >> >> URL url = Resources.getResource("Main.ruta"); >> String text = Resources.toString(url, Charsets.UTF_8); >> AnalysisEngineDescription aeDes = >> Ruta.createAnalysisEngineDescription(text, tsd); >> this.ae <http://this.ae> = >> UIMAFramework.produceAnalysisEngine(aeDes); >> >> CAS cas = ae.newCAS(); >> converter.populateCas(sentence.getTextSentence(), cas); >> ae.process(cas); >> >> The populateCAS method is responsible for translating >> our annotations into RUTA annotations, but it doesn't >> set any type priority explicitly. >> We don't know much about type priorities, the RUTA >> references we found say very little about that.Are >> they necessary for doing what we need? >> >> The file that contains the above lines is available here: >> >> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java >> The processCAS mehtod is available here: >> >> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java >> The script we are calling is available here: >> >> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta >> >> PS:Yes, We remembered the semicolons. >> >> Thanks for the help :) >> >> >> >> 2015-05-26 15:30 GMT-03:00 Diego Buoro >> <[email protected] <mailto:[email protected]>>: >> >> I think i wasn't clear enough, and i should be >> more specific. >> >> I have a type system in which all words have been >> annotated as Tokens. I am calling a RUTA script >> from a java class, and that script has only one rule: >> Token Token {-> Problem} >> >> However, with this script, no Problems are >> created. When I try >> Token {-> Problem} >> >> I get one problem for each Token, which is what I >> expected. Why can't I create annotations using >> rules with more than one word? >> >> Thanks >> >> >> >> >> 2015-05-26 14:49 GMT-03:00 Diego Buoro >> <[email protected] <mailto:[email protected]>>: >> >> Hello guys,how are you doing? >> >> I would like to know once i have called RUTA >> from a Java project, how can i mark >> consecutive tokens as a "Problem" (the name of >> my annotation, in this case)? >> >> Thanks in advice! >> >> >> >> >> >> >>
