Hi Julien,

Using default settings, the CPM will run the collection reader in one
thread, each processing pipeline in another, and finally another
thread for the Cas consumers. These threads can only run concurrently
if there are enough CASes. A Cas pool size of 1 limits all work to one
thread at a time.

Does your profile take into account the I/O time reading the documents?

Eddie

On Wed, Aug 13, 2008 at 10:03 AM, Julien Nioche
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am slightly puzzled by the following case. I have integrated an aggregate
> engine into my code in a very straightforward way :
>
> * // reset the tcas for the next document
>  tcas.reset();
>
>  InputStream fis = new BufferedInputStream(new FileInputStream(target));
>  byte[] contents = new byte[(int) target.length()];
>  fis.read(contents);
>  fis.close();
>
>  String document = new String(contents);
>
>  tcas.setDocumentText(document);
>  tcas.setDocumentLanguage("en");
>
>  controller.process(tcas);
>
> *Using the aggregate engine from the CPM is more than 10x faster than my
> client code; both are running in a single thread. I profiled my application
> and found that the slower part is
>
> *87.9% - 50,781 ms
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process*
> *
> *i.e the time is not spent in other parts of my code but in the process()
> method.*
>
> *I get a similar difference even when setting *casPoolSize="1" *in my CPE
> descriptor.* *Needless to say that I'd like to get the same type of
> performance in both cases. Any idea of what might be the cause?*
> **
> *Thanks
>
> Julien*
>
> --
> *DigitalPebble Ltd
> http://www.digitalpebble.com
>

Reply via email to