Hi Eddie,
Thank you for your message. Yes, the profiling includes everything in my
client code, including the I/O.
I checked that casPoolSize="1" in my CPM config file. Setting
casPoolSize="3" in the config file makes virtually no difference, which
means that (a) loading my 2000 documents in the same thread or in a separate
one makes no difference or (b) this parameter is not taken into account at
all.
With an aggregate engine : is each primitive engine executed in a separate
thread or is the whole aggregate done in the same thread?
Thank you for you help
Julien
2008/8/14 Eddie Epstein <[EMAIL PROTECTED]>
> Hi Julien,
>
> Using default settings, the CPM will run the collection reader in one
> thread, each processing pipeline in another, and finally another
> thread for the Cas consumers. These threads can only run concurrently
> if there are enough CASes. A Cas pool size of 1 limits all work to one
> thread at a time.
>
> Does your profile take into account the I/O time reading the documents?
>
> Eddie
>
> On Wed, Aug 13, 2008 at 10:03 AM, Julien Nioche
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I am slightly puzzled by the following case. I have integrated an
> aggregate
> > engine into my code in a very straightforward way :
> >
> > * // reset the tcas for the next document
> > tcas.reset();
> >
> > InputStream fis = new BufferedInputStream(new FileInputStream(target));
> > byte[] contents = new byte[(int) target.length()];
> > fis.read(contents);
> > fis.close();
> >
> > String document = new String(contents);
> >
> > tcas.setDocumentText(document);
> > tcas.setDocumentLanguage("en");
> >
> > controller.process(tcas);
> >
> > *Using the aggregate engine from the CPM is more than 10x faster than my
> > client code; both are running in a single thread. I profiled my
> application
> > and found that the slower part is
> >
> > *87.9% - 50,781 ms
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process*
> > *
> > *i.e the time is not spent in other parts of my code but in the process()
> > method.*
> >
> > *I get a similar difference even when setting *casPoolSize="1" *in my CPE
> > descriptor.* *Needless to say that I'd like to get the same type of
> > performance in both cases. Any idea of what might be the cause?*
> > **
> > *Thanks
> >
> > Julien*
> >
> > --
> > *DigitalPebble Ltd
> > http://www.digitalpebble.com
> >
>
--
DigitalPebble Ltd
http://www.digitalpebble.com