Just to clarify, what do you mean by "annotation"? Is there a specific Analysis Engine that you are using? What is a "record"? Is this a document? It would actually be surprizing for many applications if annotation were not the bottleneck, given that some annotation processes are quite expensive, but this doesn't seem like what you mean here. I can't tell from your question whether it is the process that determines the annotations that is a burden or the actual adding of the annotations to the cas.
-----Original Message----- From: rohan rai [mailto:[EMAIL PROTECTED] Sent: Thursday, June 26, 2008 7:36 AM To: [email protected] Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed When I profile a UIMA application What I see that annonation takes a lot of time If I profile I see that to annotate 1 record , it takes around 0.06 seconds Now you may say its good Now scale up Although it does not scale up linearly. But here is rough estimate on experiments done 6000 records take 6 min to annotate 800000 record tale around 10 hrs min to annotate Which is bad. One thing is that I am treating each record individually as a cas Even if I treat all the record as a single cas it takes around 6-7 hrs Which is still not good in terms of speed Is there a way out? Can I improve performance by any means?? Regards Rohan
