Have you examined various DUCC and system log files for possible clues? On Fri, Jun 12, 2020 at 1:16 PM Dr. Raja M. Suleman < [email protected]> wrote:
> Hi, > Thank you for your reply and I'm sorry I couldn't get back to this > earlier. > > To get a better picture of the processing speed of DUCC, I made a dummy > pipeline where the CollectionReader runs a for loop to generate 100k > workitems (so no disk reads). each workitem only has a simple string in it. > These are then passed on to the CasMultiplier where for each workitem I'm > creating a new CAS with DocumentInfo (again only having a simple string > value) and pass it as a newcas to the CasConsumer. The CasConsumer doesn't > do anything except add the Document received in the CAS to the logger. So > basically this pipeline isn't doing anything, no Input reads and the only > output is the information added to the logger. Running this on the cluster > with 2 slave nodes with 8-CPUs and 32GB RAM each is still taking more than > 30 minutes. I don't understand how is this possible since there's no heavy > I/O processing is happening in the code. > > Any ideas please? > > Thank you. > > On 2020/05/18 12:47:41, Eddie Epstein <[email protected]> wrote: > > Hi, > > > > Removing the AE from the pipeline was a good idea to help isolate the > > bottleneck. The other two most likely possibilities are the collection > > reader pulling from elastic search or the CAS consumer writing the > > processing output. > > > > DUCC Jobs are a simple way to scale out compute bottlenecks across a > > cluster. Scaleout may be of limited or no value for I/O bound jobs. > > Please give a more complete picture of the processing scenario on DUCC. > > > > Regards, > > Eddie > > > > > > On Sat, May 16, 2020 at 1:29 AM Raja Muhammad Suleman < > > [email protected]> wrote: > > > > > Hi, > > > I've been trying to run a very small UIMA DUCC cluster with 2 slave > nodes > > > having 32GB of RAM each. I wrote a custom Collection Reader to read > data > > > from an Elasticsearch index and dump it into a new index after certain > > > analysis engine processing. The Analysis Engine is a simple sentiment > > > analysis code. The performance I'm getting is very slow as it is only > able > > > to process ~150 documents/minute. > > > To test the performance without the analysis engine, I removed the AE > from > > > the pipeline but still I did not get any improvement in the processing > > > speeds. Can you please guide me as to where I might be going wrong or > what > > > I can do to improve the processing speeds? > > > > > > Thank you. > > > ________________________________ > > > Edge Hill University<http://ehu.ac.uk/home/emailfooter> > > > Teaching Excellence Framework Gold Award< > http://ehu.ac.uk/tef/emailfooter> > > > ________________________________ > > > This message is private and confidential. If you have received this > > > message in error, please notify the sender and remove it from your > system. > > > Any views or opinions presented are solely those of the author and do > not > > > necessarily represent those of Edge Hill or associated companies. Edge > Hill > > > University may monitor email traffic data and also the content of > email for > > > the purposes of security and business communications during staff > absence.< > > > http://ehu.ac.uk/itspolicies/emailfooter> > > > > > >
