Hi Prasanna, I am currently using 3.1.2 to process ~40M notes using 14 CPEs with AggregatePlaintextUMLSProcessor+DBConsumer. So far, ~34M notes have been annotated and stored. Altogether, I'm seeing 0.054sec/note. This is with 4.1k rows in v_snomed_fword_lookup. One modification we had to make was to change anno_base_id datatype from 'int' to 'bigint'. It would be very interesting to see Hadoop used with ctakes... -Jon
On Tue, Jul 1, 2014 at 1:54 AM, Prasanna Bala <[email protected]> wrote: > Hi, > > I have certain clarifications. This is regarding using third party > libraries with cTakes. I have clarifications on run time for processing > documents using cTakes. We are able to run the cTakes through batch mode. > But we have plans to run documents for 1 million clinical documents. Can > anyone tell me if they have tackled scalability using cTakes ? I have an > idea to distribute the process using Hadoop. There are various libraries > available that can use UIMA and distribute the process using Hadoop. Since > cTakes is also developed using UIMA, I think there should be a way to > distribute process. Have anyone tried this ? Are there any limitations in > distributing problems using cTakes ? Your thoughts please ? > > Regards, > Prasanna >
