Great minds think alike :-)

LeHouillier, Frank D. wrote:
To test your theory that it is the writing of Annotations to the CAS
that is taking so long I ran an annotator with this code:
public class TestAnnotator extends JCasAnnotator_ImplBase {

        @Override
        public void process(JCas arg0) throws
AnalysisEngineProcessException {

                int i = 0;
                
                while (i < 100000)
                {
                        Annotation a = new Annotation(arg0);
                        
                        a.setBegin(1);
                        a.setEnd(2);
                        a.addToIndexes();
                        
                        i++;
                }
                
                System.out.println("Done");

        }

}
This takes less than two seconds to run on my laptop.  Is it possible
your bottleneck isn't where you think it is?

-----Original Message-----
From: rohan rai [mailto:[EMAIL PROTECTED] Sent: Thursday, June 26, 2008 12:04 PM
To: [email protected]
Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of
speed

@Pascal: As I have already said the timing does not scale linearly
              Secondly it the approx times which I have specified
@Frank:
     I was talking about actual adding of annotation to CAS
    Record refer to lets say in tags like these <a>.....</a>
    and the document consist of such record
    Annotation is done via this method
                               MyType annotation = new MyType(jCas);
                               annotation.setBegin(start);
                               annotation.setEnd(end);
                               annotation.addToIndexes();
   This takes a lot of time which is not likeable.

Regards
Rohan


On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
[EMAIL PROTECTED]> wrote:

Just to clarify, what do you mean by "annotation"? Is there a specific Analysis Engine that you are using? What is a "record"? Is this a document? It would actually be surprizing for many applications if annotation were not the bottleneck, given that some annotation processes are quite expensive, but this doesn't seem like what you mean here. I can't tell from your question whether it is the process that determines the annotations that is a burden or the actual

adding of the annotations to the cas.

-----Original Message-----
From: rohan rai [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2008 7:36 AM
To: [email protected]
Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed

When I profile a UIMA application
What I see that annonation takes a lot of time If I profile I see that

to annotate 1 record , it takes around 0.06 seconds Now you may say its good Now scale up Although it does not scale up linearly. But here

is rough estimate on experiments done 6000 records take 6 min to annotate 800000 record tale around 10 hrs min to annotate Which is
bad.
One thing is that I am treating each record individually as a cas Even

if I treat all the record as a single cas it takes around 6-7 hrs Which is still not good in terms of speed

Is there a way out?
Can I improve performance by any means??

Regards
Rohan

Reply via email to