To test your theory that it is the writing of Annotations to the CAS
that is taking so long I ran an annotator with this code:
public class TestAnnotator extends JCasAnnotator_ImplBase {
@Override
public void process(JCas arg0) throws
AnalysisEngineProcessException {
int i = 0;
while (i < 100000)
{
Annotation a = new Annotation(arg0);
a.setBegin(1);
a.setEnd(2);
a.addToIndexes();
i++;
}
System.out.println("Done");
}
}
This takes less than two seconds to run on my laptop. Is it possible
your bottleneck isn't where you think it is?
-----Original Message-----
From: rohan rai [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2008 12:04 PM
To: [email protected]
Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of
speed
@Pascal: As I have already said the timing does not scale linearly
Secondly it the approx times which I have specified
@Frank:
I was talking about actual adding of annotation to CAS
Record refer to lets say in tags like these <a>.....</a>
and the document consist of such record
Annotation is done via this method
MyType annotation = new MyType(jCas);
annotation.setBegin(start);
annotation.setEnd(end);
annotation.addToIndexes();
This takes a lot of time which is not likeable.
Regards
Rohan
On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
[EMAIL PROTECTED]> wrote:
> Just to clarify, what do you mean by "annotation"? Is there a
> specific Analysis Engine that you are using? What is a "record"? Is
> this a document? It would actually be surprizing for many
> applications if annotation were not the bottleneck, given that some
> annotation processes are quite expensive, but this doesn't seem like
> what you mean here. I can't tell from your question whether it is the
> process that determines the annotations that is a burden or the actual
> adding of the annotations to the cas.
>
> -----Original Message-----
> From: rohan rai [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 26, 2008 7:36 AM
> To: [email protected]
> Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed
>
> When I profile a UIMA application
> What I see that annonation takes a lot of time If I profile I see that
> to annotate 1 record , it takes around 0.06 seconds Now you may say
> its good Now scale up Although it does not scale up linearly. But here
> is rough estimate on experiments done 6000 records take 6 min to
> annotate 800000 record tale around 10 hrs min to annotate Which is
bad.
> One thing is that I am treating each record individually as a cas Even
> if I treat all the record as a single cas it takes around 6-7 hrs
> Which is still not good in terms of speed
>
> Is there a way out?
> Can I improve performance by any means??
>
> Regards
> Rohan
>