Re: cTakes Scalability Problem

Prasanna Bala Tue, 01 Jul 2014 08:51:07 -0700

Hi,

Thanks for your suggestions. So I have to change the "int" to "bigint" to
improve the performance.


I am looking at UIMA DUCC.
http://uima.apache.org/doc-uimaducc-whatitam.html

The problem with Hadoop is it runs in batch process. So it cannot be used
for low latency real systems. But still I want to explore it.


On Tue, Jul 1, 2014 at 6:20 PM, Jonathan Bates <[email protected]> wrote:

> Hi Prasanna,
> I am currently using 3.1.2 to process ~40M notes using 14 CPEs with
> AggregatePlaintextUMLSProcessor+DBConsumer.  So far, ~34M notes have been
> annotated and stored.  Altogether, I'm seeing 0.054sec/note.  This is with
> 4.1k rows in v_snomed_fword_lookup.  One modification we had to make was to
> change anno_base_id datatype from 'int' to 'bigint'.  It would be very
> interesting to see Hadoop used with ctakes...
> -Jon
>
>
> On Tue, Jul 1, 2014 at 1:54 AM, Prasanna Bala <[email protected]
> > wrote:
>
>> Hi,
>>
>> I have certain clarifications. This is regarding using third party
>> libraries with cTakes. I have clarifications on run time for processing
>> documents using cTakes. We are able to run the cTakes through batch mode.
>> But we have plans to run documents for 1 million clinical documents. Can
>> anyone tell me if they have tackled scalability using cTakes ? I have an
>> idea to distribute the process using Hadoop. There are various libraries
>> available that can use UIMA and distribute the process using Hadoop. Since
>> cTakes is also developed using UIMA, I think there should be a way to
>> distribute process. Have anyone tried this ? Are there any limitations in
>> distributing problems using cTakes ? Your thoughts please ?
>>
>> Regards,
>> Prasanna
>>
>
>

Re: cTakes Scalability Problem

Reply via email to