To test your theory that it is the writing of Annotations to the CAS
that is taking so long I ran an annotator with this code: 

public class TestAnnotator extends JCasAnnotator_ImplBase {

        @Override
        public void process(JCas arg0) throws
AnalysisEngineProcessException {

                int i = 0;
                
                while (i < 100000)
                {
                        Annotation a = new Annotation(arg0);
                        
                        a.setBegin(1);
                        a.setEnd(2);
                        a.addToIndexes();
                        
                        i++;
                }
                
                System.out.println("Done");

        }

} 

This takes less than two seconds to run on my laptop.  Is it possible
your bottleneck isn't where you think it is?

-----Original Message-----
From: rohan rai [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 26, 2008 12:04 PM
To: [email protected]
Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of
speed

@Pascal: As I have already said the timing does not scale linearly
              Secondly it the approx times which I have specified
@Frank:
     I was talking about actual adding of annotation to CAS
    Record refer to lets say in tags like these <a>.....</a>
    and the document consist of such record
    Annotation is done via this method
                               MyType annotation = new MyType(jCas);
                               annotation.setBegin(start);
                               annotation.setEnd(end);
                               annotation.addToIndexes();
   This takes a lot of time which is not likeable.

Regards
Rohan


On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
[EMAIL PROTECTED]> wrote:

> Just to clarify, what do you mean by "annotation"?  Is there a 
> specific Analysis Engine that you are using? What is a "record"? Is 
> this a document?  It would actually be surprizing for many 
> applications if annotation were not the bottleneck, given that some 
> annotation processes are quite expensive, but this doesn't seem like 
> what you mean here. I can't tell from your question whether it is the 
> process that determines the annotations that is a burden or the actual

> adding of the annotations to the cas.
>
> -----Original Message-----
> From: rohan rai [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 26, 2008 7:36 AM
> To: [email protected]
> Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed
>
> When I profile a UIMA application
> What I see that annonation takes a lot of time If I profile I see that

> to annotate 1 record , it takes around 0.06 seconds Now you may say 
> its good Now scale up Although it does not scale up linearly. But here

> is rough estimate on experiments done 6000 records take 6 min to 
> annotate 800000 record tale around 10 hrs min to annotate Which is
bad.
> One thing is that I am treating each record individually as a cas Even

> if I treat all the record as a single cas it takes around 6-7 hrs 
> Which is still not good in terms of speed
>
> Is there a way out?
> Can I improve performance by any means??
>
> Regards
> Rohan
>

Reply via email to