Re: AW: AW: Ruta - MARKFAST

Peter Klügl Thu, 23 May 2013 04:20:26 -0700

Hi,

On 23.05.2013 13:06, [email protected] wrote:
> Hello Peter,
>
> Now that I understand it, it's a nice feature.
>
> By the way, where can I find a good documentation of Ruta? I only know of 
> http://people.apache.org/~pkluegl/site/textmarker-current/tools.textmarker.book.html
>


That is the official documentation. An up-to-date version that describes
the new features since 2.0.0 can be found in the trunk.

I know that there are many passages and section that need to be added or
improved, but it is hard to find enough time for it.

There is ongoing work by others to improve the description of the java
integration for uses cases in part of speech tagging, and we are
planning to provide screencasts for the Ruta Workbench.

Are there any specific passages that should be improved or added? I also
easily forget to add important information (since I implemented it).

> and http://tmwiki.informatik.uni-wuerzburg.de/. A more detailed description 
> would be appreciated.

This wiki refers to the old version hosted at sourceforge and should not
be referred to.

Best,

Peter

> Thanks,
> Armin
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:[email protected]] 
> Gesendet: Mittwoch, 22. Mai 2013 15:09
> An: [email protected]
> Betreff: Re: AW: Ruta - MARKFAST
>
> Hi,
>
> yes this example won't work without changes, because the word list is 
> sensitive to white spaces, e.g., you distinguish between "n.C." and "n.
> C.". I know this sound like a bug, but it is rather a feature.
>
> In order to solve your problem you could either remove all spaces in your 
> word list, you could add "n.Chr." and "v.Chr." (without space) to your word 
> list, or you could retain the spaces before calling MARKFAST (Document{-> 
> RETAINTYPE(SPACE)};)
>
> The short explanation for this is that the action and the word list won't see 
> any spaces with the default filtering settings, thus they check on a 
> candidate like "n.Chr". However, in the trie, there is no "h"
> in that path without space before the "C".
>
> Best,
>
> Peter
>
> On 22.05.2013 10:52, [email protected] wrote:
>> Hi Peter,
>>
>> your example does work perfectly fine. But try this as word list and input 
>> document:
>>
>> nach Christus
>> nach der Zeitenwende
>> n. C.
>> n.C.
>> nC.
>> n. Chr.
>> n. d. Z.
>> n.d.Z.
>> unserer Zeit
>> unserer Zeitrechnung
>> u. Z.
>> u.Z.
>> v. C.
>> v.C.
>> vC.
>> v. Chr.
>> v. d. Z.
>> v.d.Z.
>> vor Christus
>> vor der Zeitenwende
>> vor unserer Zeitrechnung
>> v. u. Z.
>> v.u.Z.
>>
>> "n. Chr." and "v. Chr." are not recognized. Do you have the same result?
>>
>> Cheers,
>> Armin
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Peter Klügl [mailto:[email protected]]
>> Gesendet: Dienstag, 21. Mai 2013 19:58
>> An: [email protected]
>> Betreff: Re: Ruta - MARKFAST
>>
>> Hi,
>>
>> On 21.05.2013 15:49, [email protected] wrote:
>>> Hello!
>>>
>>> Is there any possibility to match strings like
>>>
>>> nC.
>>> v. Chr.
>>>
>>> with MARKFAST?
>> Yes. Did you observe any problems? I just tested it with:
>>
>> Wordlist:
>> nC.
>> v. Chr.
>>
>> Input document:
>> nC.
>> v. Chr.
>> n C .
>> v . Chr.
>>
>> Script:
>> PACKAGE uima.ruta.tests;
>> WORDLIST testList = 'test.txt';
>> DECLARE Test;
>> Document{->MARKFAST(Test, testList)};
>>
>> ... creates four annotations of type test.
>>
>> Best,
>>
>> Peter
>>
>>
>>
>>> Cheers,
>>> Armin

Re: AW: AW: Ruta - MARKFAST

Reply via email to