There are a few DUCC features that might be of particular interest for scaling out UIMA analytics.
- all user code for batch processing continues to use the existing UIMA component model: collection readers, cas multiplers, analysis engines, and cas consumers.** - DUCC supports assembling and debugging a single threaded process with these components, and then with no code change launch a highly scaled out deployment. - for applications that use too much RAM to be able to utilize all the cores on worker machines, DUCC can do the vertical (thread) scaleout needed to share memory. - DUCC automatically captures the performance breakdown of the UIMA-based processes, as well as capturing process statistics including CPU, RAM, swap, pagefaults and GC. Performance breakdown info for individual tasks (DUCC work items) can optionally be captured. - DUCC has extensive error handling, automatically resubmitting work associated with uncaught exceptions, process crashes, machine failures, network failures, etc. - Exceptions are convenient to get to, and an attempt is made to make obvious things that might be tricky to find, such all the reasons a process might fail to start, without having to dig through DUCC framework logs. ** DUCC services introduce a new user programmable component, a service pinger, that is responsible for validating that a service is operating correctly. The service pinger can also dynamically change the number of instances of a service, and it can restart individual instances that are determined to be acting badly. Eddie On Fri, Sep 15, 2017 at 10:32 AM, Osborne, John D <[email protected]> wrote: > Thanks Richard and Nicholas, > > Nicholas - have you looked at SUIM (https://github.com/oaqa/suim) ? > > It's also doing UIMA on Spark - I'm wondering if you are aware of it and > how it is different from your own project? > > Thanks for any info, > > -John > > > ________________________________________ > From: Richard Eckart de Castilho [[email protected]] > Sent: Friday, September 15, 2017 5:29 AM > To: [email protected] > Subject: Re: UIMA analysis from a database > > On 15.09.2017, at 09:28, Nicolas Paris <[email protected]> wrote: > > > > - UIMA-AS is another way to program UIMA > > Here you probably meant uimaFIT. > > > - UIMA-FIT is complicated > > - UIMA-FIT only work with UIMA > > ... and I suppose you mean UIMA-AS here. > > > - UIMA only focuses on text Annotation > > Yep. Although it has also been used for other media, e.g. video and audio. > But the core UIMA framework doesn't specifically consider these media. > People who apply it UIMA in the context of other media do so with custom > type systems. > > > - UIMA is not good at: > > - text transformation > > It is not straight-forward but possible. E.g. the text normalizers in > DKPro Core make use of either different views for different states of > normalization or drop the original text and forward the normalized > text within a pipeline by means of a CAS multiplier. > > > - read data from source in parallel > > - write data to folder in parallel > > Not sure if these two are limitations of the framework > rather than of the way that you use readers and writers > in the particular scale-out mode you are working with. > > > - machine learning interface > > UIMA doesn't offer ML as part of the core framework because > that is simply not within the scope of what the UIMA framework > aims to achieve. > > There are various people who have built ML around UIMA, e.g. > ClearTK (https://urldefense.proofpoint.com/v2/url?u=http- > 3A__cleartk.github.io_cleartk_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t- > De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D- > L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=J1-BGfzlrX9t3- > Vg5K7mAVBHQSb7M5PAbTYIJoh6sOM&e= ) or DKPro TC > (https://urldefense.proofpoint.com/v2/url?u=https- > 3A__dkpro.github.io_dkpro-2Dtc_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t- > De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D- > L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=kye5D2izwKE_9V2QQW8leiKp0p-91U- > CFwXJMFmCd3w&e= ) - and as you did, it > can be combined in various ways with ML frameworks that > specialize specifically on ML. > > > Cheers, > > -- Richard > > >
