Hi Tam, This limit is 500 for the job. It is due to the memory size of keeping a copy of all active "work item" CASes in the JD.
This has not been a problem for our users because work items are not individual documents. Rather they are groups of documents (or groups of CASes) things like all files in a directory, or files containing many documents. Then the CM (Cas Multiplier) running in each thread of each JP (JobProcess) reads the data and creates CASes for each document (or input CAS) to send down the pipeline. This also allows grouping the output corresponding to the work item (e.g. for many documents) into a single output file. See the DUCC sample apps for an example of breaking a single text file into many documents, and grouping all the output CASes for the documents in the file into a single zipfile. We are working on a change that will significantly increase the max number of dispatched CASes. Eddie On Wed, Nov 5, 2014 at 8:55 PM, Thanh Tam Nguyen <[email protected]> wrote: > Hi Eddie, > I've checked the webserver. Since I have been testing on a small collection > of documents (20 documents), there were 15 work items for the job. > > Did you mean 500 work items per machine? > > Regards, > Tam > > On Thu, Nov 6, 2014 at 1:20 AM, Eddie Epstein <[email protected]> wrote: > > > Hi, > > > > There is a default limit of 500 work items dispatched at the same time. > How > > many dispatched are shown for the job? > > > > Eddie > > > > > > On Wed, Nov 5, 2014 at 3:11 AM, Thanh Tam Nguyen <[email protected]> > > wrote: > > > > > Hi Eddie, > > > Thanks for your email. I followed the documentation and I was able to > run > > > DUCC jobs using different user instead of user "ducc". But while I was > > > watching the webserver, I only found one machine running the jobs. In > the > > > tab System>Machines, I can see all the machine statuses are "up". What > > > should I do to run the jobs on all machines? > > > > > > > > > Regards, > > > Tam > > > > > > On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <[email protected]> > > > wrote: > > > > > > > Hi Tam, > > > > > > > > In the install documentation, > > > > http://uima.apache.org/d/uima-ducc-1.0.0/installation.html, > > > > the section "Multi-User Installation and Verification" describes how > to > > > > configure setuid-root > > > > for ducc_ling so that DUCC jobs are run as the submitting user > instead > > of > > > > user "ducc". > > > > > > > > The setuid-root ducc_ling should be put on every DUCC node, in the > same > > > > place, > > > > and ducc.properties updated to point at that location. > > > > > > > > Eddie > > > > > > > > > > > > On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen < > [email protected] > > > > > > > wrote: > > > > > > > > > Hi Eddie, > > > > > Would you tell me more details how to setup DUCC for multiuser > mode? > > > > FYI, I > > > > > have successfully setup and ran my UIMA analysis engine on single > > user > > > > > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how > > to > > > > get > > > > > it worked on a cluster of machines. > > > > > > > > > > Thanks, > > > > > Tam > > > > > > > > > > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > The $DUCC_RUNTIME tree needs to be on a shared filesystem > > accessible > > > > from > > > > > > all machines. > > > > > > For single user mode ducc_ling could be referenced from there as > > > well. > > > > > > But for multiuser setup, ducc_ling needs setuid and should be > > > installed > > > > > on > > > > > > the root drive. > > > > > > > > > > > > Eddie > > > > > > > > > > > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > I've been working through the installation of UIMA DUCC, and > have > > > > > > > successfully got it set up and running on a single machine. I'd > > now > > > > > like > > > > > > to > > > > > > > move to running it on a cluster of machines, but it isn't clear > > to > > > me > > > > > > from > > > > > > > the installation guide as to whether I need to install DUCC on > > each > > > > > node, > > > > > > > or whether ducc_ling is the only thing that needs installing on > > the > > > > > > > non-head nodes. > > > > > > > > > > > > > > Could anyone shed some light on the process please? > > > > > > > > > > > > > > Thanks, > > > > > > > James > > > > > > > > > > > > > > > > > > > > > > > > > > > >
