Disclaimer: I'm not a developer, just a user. To use DBConsumer, I had to change "int" to "bigint" for the anno_base_id in the SQL tables to prevent overflow in the annotation index. Speed did increase marginally after the change, but I don't understand how the change in datatype could have been the cause... Let us know how it works out! -Jon
On Tue, Jul 1, 2014 at 11:50 AM, Prasanna Bala <[email protected]> wrote: > Hi, > > Thanks for your suggestions. So I have to change the "int" to "bigint" to > improve the performance. > > I am looking at UIMA DUCC. > http://uima.apache.org/doc-uimaducc-whatitam.html > > The problem with Hadoop is it runs in batch process. So it cannot be used > for low latency real systems. But still I want to explore it. > > > On Tue, Jul 1, 2014 at 6:20 PM, Jonathan Bates <[email protected]> > wrote: > >> Hi Prasanna, >> I am currently using 3.1.2 to process ~40M notes using 14 CPEs with >> AggregatePlaintextUMLSProcessor+DBConsumer. So far, ~34M notes have been >> annotated and stored. Altogether, I'm seeing 0.054sec/note. This is with >> 4.1k rows in v_snomed_fword_lookup. One modification we had to make was to >> change anno_base_id datatype from 'int' to 'bigint'. It would be very >> interesting to see Hadoop used with ctakes... >> -Jon >> >> >> On Tue, Jul 1, 2014 at 1:54 AM, Prasanna Bala < >> [email protected]> wrote: >> >>> Hi, >>> >>> I have certain clarifications. This is regarding using third party >>> libraries with cTakes. I have clarifications on run time for processing >>> documents using cTakes. We are able to run the cTakes through batch mode. >>> But we have plans to run documents for 1 million clinical documents. Can >>> anyone tell me if they have tackled scalability using cTakes ? I have an >>> idea to distribute the process using Hadoop. There are various libraries >>> available that can use UIMA and distribute the process using Hadoop. Since >>> cTakes is also developed using UIMA, I think there should be a way to >>> distribute process. Have anyone tried this ? Are there any limitations in >>> distributing problems using cTakes ? Your thoughts please ? >>> >>> Regards, >>> Prasanna >>> >> >> >
