Hi,

Removing the AE from the pipeline was a good idea to help isolate the
bottleneck. The other two most likely possibilities are the collection
reader pulling from elastic search or the CAS consumer writing the
processing output.

DUCC Jobs are a simple way to scale out compute bottlenecks across a
cluster. Scaleout may be of limited or no value for I/O bound jobs.
Please give a more complete picture of the processing scenario on DUCC.

Regards,
Eddie


On Sat, May 16, 2020 at 1:29 AM Raja Muhammad Suleman <
sulem...@edgehill.ac.uk> wrote:

> Hi,
> I've been trying to run a very small UIMA DUCC cluster with 2 slave nodes
> having 32GB of RAM each. I wrote a custom Collection Reader to read data
> from an Elasticsearch index and dump it into a new index after certain
> analysis engine processing. The Analysis Engine is a simple sentiment
> analysis code. The performance I'm getting is very slow as it is only able
> to process ~150 documents/minute.
> To test the performance without the analysis engine, I removed the AE from
> the pipeline but still I did not get any improvement in the processing
> speeds. Can you please guide me as to where I might be going wrong or what
> I can do to improve the processing speeds?
>
> Thank you.
> ________________________________
> Edge Hill University<http://ehu.ac.uk/home/emailfooter>
> Teaching Excellence Framework Gold Award<http://ehu.ac.uk/tef/emailfooter>
> ________________________________
> This message is private and confidential. If you have received this
> message in error, please notify the sender and remove it from your system.
> Any views or opinions presented are solely those of the author and do not
> necessarily represent those of Edge Hill or associated companies. Edge Hill
> University may monitor email traffic data and also the content of email for
> the purposes of security and business communications during staff absence.<
> http://ehu.ac.uk/itspolicies/emailfooter>
>

Reply via email to