Great minds think alike :-)
LeHouillier, Frank D. wrote:
To test your theory that it is the writing of Annotations to the CAS
that is taking so long I ran an annotator with this code:
public class TestAnnotator extends JCasAnnotator_ImplBase {
@Override
public void process(JCas arg0) throws
AnalysisEngineProcessException {
int i = 0;
while (i < 100000)
{
Annotation a = new Annotation(arg0);
a.setBegin(1);
a.setEnd(2);
a.addToIndexes();
i++;
}
System.out.println("Done");
}
}
This takes less than two seconds to run on my laptop. Is it possible
your bottleneck isn't where you think it is?
-----Original Message-----
From: rohan rai [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2008 12:04 PM
To: [email protected]
Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of
speed
@Pascal: As I have already said the timing does not scale linearly
Secondly it the approx times which I have specified
@Frank:
I was talking about actual adding of annotation to CAS
Record refer to lets say in tags like these <a>.....</a>
and the document consist of such record
Annotation is done via this method
MyType annotation = new MyType(jCas);
annotation.setBegin(start);
annotation.setEnd(end);
annotation.addToIndexes();
This takes a lot of time which is not likeable.
Regards
Rohan
On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
[EMAIL PROTECTED]> wrote:
Just to clarify, what do you mean by "annotation"? Is there a
specific Analysis Engine that you are using? What is a "record"? Is
this a document? It would actually be surprizing for many
applications if annotation were not the bottleneck, given that some
annotation processes are quite expensive, but this doesn't seem like
what you mean here. I can't tell from your question whether it is the
process that determines the annotations that is a burden or the actual
adding of the annotations to the cas.
-----Original Message-----
From: rohan rai [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2008 7:36 AM
To: [email protected]
Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed
When I profile a UIMA application
What I see that annonation takes a lot of time If I profile I see that
to annotate 1 record , it takes around 0.06 seconds Now you may say
its good Now scale up Although it does not scale up linearly. But here
is rough estimate on experiments done 6000 records take 6 min to
annotate 800000 record tale around 10 hrs min to annotate Which is
bad.
One thing is that I am treating each record individually as a cas Even
if I treat all the record as a single cas it takes around 6-7 hrs
Which is still not good in terms of speed
Is there a way out?
Can I improve performance by any means??
Regards
Rohan