On 03/28/2014 05:28 PM, Eddie Epstein wrote:
Another alternative would be to do the final flush in the Cas consumer's
destroy method.
Another issue to be aware of, in order to balance resources between jobs,
DUCC uses preemption of job processes scheduled in a "fair-share" class.
This may not be acceptable for jobs which are doing incremental commits.
The solution is to schedule the job in a non-preemptable class.
On Fri, Mar 28, 2014 at 1:22 AM, reshu.agarwal <[email protected]>wrote:
On 03/28/2014 01:28 AM, Eddie Epstein wrote:
Hi Reshu,
The Job model in DUCC is for the Collection Reader to send "work item
CASes", where a work item represents a collection of work to be done by a
Job Process. For example, a work item could be a file or a subset of a
file
that contains many documents, where each document would be individually
put
into a CAS by the Cas Multiplier in the Job Process.
DUCC is designed so that after processing the "mini-collection"
represented
by the work item, the Cas Consumer should flush any data. This is done by
routing the "work item CAS" to the Cas Consumer, after all work item
documents are completed, at which point the CC does the flush.
The sample code described in
http://uima.apache.org/d/uima-ducc-1.0.0/duccbook.html#x1-1380009 uses
the
work item CAS to flush data in exactly this way.
Note that the PersonTitleDBWriterCasConsumer is doing a flush (a commit)
in
the process method after every 50 documents.
Regards
Eddie
On Thu, Mar 27, 2014 at 1:35 AM, reshu.agarwal <[email protected]>
wrote:
On 03/26/2014 11:34 PM, Eddie Epstein wrote:
Hi Reshu,
The collectionProcessingComplete() method in UIMA-AS has a limitation: a
Collection Processing Complete request sent to the UIMA-AS Analysis
Service
is cascaded down to all delegates; however, if a particular delegate is
scaled-out, only one of the instances of the delegate will get this
call.
Since DUCC is using UIMA-AS to scale out the Job processes, it has no
way
to deliver a CPC to all instances.
The applications we have been running on DUCC have used the Work Item
CAS
as a signal to CAS consumers to do CPC level processing. That is
discussed
in the first reference above, in the paragraph "Flushing Cached Data".
Eddie
On Wed, Mar 26, 2014 at 9:48 AM, reshu.agarwal <
[email protected]>
wrote:
On 03/26/2014 06:43 PM, Eddie Epstein wrote:
Are you using standard UIMA interface code to Solr? If so, which Cas
Consumer?
Taking at quick look at the source code for SolrCASConsumer, the batch
and
collection process complete methods appear to do nothing.
Thanks,
Eddie
On Wed, Mar 26, 2014 at 6:08 AM, reshu.agarwal <
[email protected]>
wrote:
On 03/21/2014 11:42 AM, reshu.agarwal wrote:
Hence we can not attempt batch processing in cas consumer and it
increases our process timing. Is there any other option for that or
is
it a
bug in DUCC?
Please reply on this problem as if I am sending document in solr
one by
one by cas consumer without using batch process and committing
solr. It
is
not optimum way to use this. Why ducc is not calling collection
Process
Complete method of Cas Consumer? And If I want to do that then What
is
the
way to do this?
I am not able to find any thing about this in DUCC book.
Thanks in Advanced.
--
Thanks,
Reshu Agarwal
Hi Eddie,
I am not using standard UIMA interface code to Solr. I create my
own Cas
Consumer. I will take a look on that too. But the problem is not for
particularly to use solr, I can use any source to store my output. I
want
to do batch processing and want to use collectionProcessComplete. Why
DUCC
is not calling it? I check it with UIMA AS also and my cas consumer is
working fine with it and also performing batch processing.
--
Thanks,
Reshu Agarwal
Hi Eddie,
I am using cas consumer similar to apache uima example:
"apache-uima/examples/src/org/apache/uima/examples/cpe/
PersonTitleDBWriterCasConsumer.java"
--
Thanks,
Reshu Agarwal
Hi Eddie,
You are right I know this fact. PersonTitleDBWriterCasConsumer is doing a
flush (a commit) in the process method after every 50 documents and if less
then 50 documents in cas it will do commit or flush by
collectionProcessComplete method. So, If it is not called then those
documents can not be committed. That is why I want ducc calls this method.
--
Thanks,
Reshu Agarwal
Hi,
Destroy method worked for me. It did the same what I wanted from
CollectionProcessComplete method.
--
Thanks,
Reshu Agarwal