Re: cTakes Scalability Problem

Jonathan Bates Wed, 02 Jul 2014 07:51:06 -0700

Disclaimer: I'm not a developer, just a user.   To use DBConsumer, I had to
change "int" to "bigint" for the anno_base_id in the SQL tables to prevent
overflow in the annotation index.  Speed did increase marginally after the
change, but I don't understand how the change in datatype could have been
the cause...  Let us know how it works out!
-Jon



On Tue, Jul 1, 2014 at 11:50 AM, Prasanna Bala <[email protected]>
wrote:

> Hi,
>
> Thanks for your suggestions. So I have to change the "int" to "bigint" to
> improve the performance.
>
> I am looking at UIMA DUCC.
> http://uima.apache.org/doc-uimaducc-whatitam.html
>
> The problem with Hadoop is it runs in batch process. So it cannot be used
> for low latency real systems. But still I want to explore it.
>
>
> On Tue, Jul 1, 2014 at 6:20 PM, Jonathan Bates <[email protected]>
> wrote:
>
>> Hi Prasanna,
>> I am currently using 3.1.2 to process ~40M notes using 14 CPEs with
>> AggregatePlaintextUMLSProcessor+DBConsumer.  So far, ~34M notes have been
>> annotated and stored.  Altogether, I'm seeing 0.054sec/note.  This is with
>> 4.1k rows in v_snomed_fword_lookup.  One modification we had to make was to
>> change anno_base_id datatype from 'int' to 'bigint'.  It would be very
>> interesting to see Hadoop used with ctakes...
>> -Jon
>>
>>
>> On Tue, Jul 1, 2014 at 1:54 AM, Prasanna Bala <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I have certain clarifications. This is regarding using third party
>>> libraries with cTakes. I have clarifications on run time for processing
>>> documents using cTakes. We are able to run the cTakes through batch mode.
>>> But we have plans to run documents for 1 million clinical documents. Can
>>> anyone tell me if they have tackled scalability using cTakes ? I have an
>>> idea to distribute the process using Hadoop. There are various libraries
>>> available that can use UIMA and distribute the process using Hadoop. Since
>>> cTakes is also developed using UIMA, I think there should be a way to
>>> distribute process. Have anyone tried this ? Are there any limitations in
>>> distributing problems using cTakes ? Your thoughts please ?
>>>
>>> Regards,
>>> Prasanna
>>>
>>
>>
>

Re: cTakes Scalability Problem

Reply via email to