Okay.  That makes sense.  From reading the DuccBook, I saw in chapter 5 on 
services "A service is one or more long-running processes that await requests 
from UIMA pipeline components and return something in response."  Makes it 
sound like Ducc services only service Ducc jobs, but if I can call services 
from components outside of Ducc control, that solves my problem.

Thanks / Dan

-----Original Message-----
From: Eddie Epstein [mailto:[email protected]] 
Sent: Friday, November 21, 2014 7:11 AM
To: [email protected]
Subject: Re: DUCC web server interfacing

On Thu, Nov 20, 2014 at 10:01 PM, D. Heinze <[email protected]> wrote:

> Eddie... thanks.  Yes, that sounds like I would not have the advantage 
> of DUCC managing the UIMA pipeline.
>

Depends on the definition of "managing". DUCC manages the lifecycle of analytic 
pipelines running as job processes and as services. There are differences in 
how DUCC decides how many instances of each are run. And you are right that 
only for jobs will DUCC send work items to the analytic pipeline.


>
> To break it down a little for the uninitiated (me),
>
>  1. how do I start a DUCC job that stays resident because it has high 
> startup cost (e.g. 2 minutes to load all the resources for the UIMA 
> pipeline VS about 2 seconds to process each request)?
>

Run the pipeline as a service. A service can be configured to start 
automatically, as soon as DUCC starts. If the load on the service increases, 
DUCC can be told [manually or programmatically] to launch additional service 
instances.


> 2. once I have a resident job, how do I get the Job Driver to 
> iteratively feed references to each next document (as they are 
> received) to the resident Job Process?  Because all the input jobs 
> will be archived anyhow, I'm okay with passing them through the file system 
> if needed.
>

The easiest approach is to have an application driver, say a web service, 
directly feed input to the service. If using references as input, the same 
analytic pipeline could be used both for live processing as a service and for 
batch job processing.

DUCC jobs are designed for batch work, where the size of the input collection 
is known and the number of job processes will be replicated as much as 
possible, given available resources and the job's fair share when multiple jobs 
are running.

DUCC services are intended to support job pipelines, for example a large memory 
but low latency analytic that can be shared by many job process instances, or 
for interactive applications.

Have you looked at creating a UIMA-AS service from a UIMA pipeline?

Eddie


Reply via email to