I got around it by removing the default seeders by specifying an empty seeders 
list since we don’t need the MARKUP annotations anymore.

I still don’t know why it created so much overhead but it sometimes seemed to 
rival the POS tagger in processing time.

Anyway, this leads me to the next question. Can I disable the creation of Ruta 
basic annotations entirely to save processing overhead and only apply Ruta 
rules to other annotation types created by other AEs such as our own?

Cheers
Mario

> On 21 Dec 2015, at 16:09 , Mario Juric <[email protected]> wrote:
> 
> Hi Peter,
> 
> I noticed that occasionally the initialisation in 
> RutaEngine::initializeStream can tak very long time. I can’t really explain 
> them and it seems independent of document length since I have seen this with 
> even very small XML documents.
> 
> The method seems to spend much time in the DefaultSeeder when creating MARKUP 
> annotations during subiterator.moveToNext calls (line 89) and inside 
> Subiterator it seems to be the while loop inside adjustForStrictForward (line 
> 232), which is inside UIMA core classes. I haven’t gone into any deeper 
> analysis yet but I first like to hear whether you have an idea what could be 
> the main cause(s) for this?
> 
> We use Ruta 2.3.1 with UIMA 2.8.1
> 
> 
> Cheers
> Mario

Reply via email to