Re: Sentence splitter [EXTERNAL]

2018-03-13 Thread Finan, Sean
Just fyi, a lot of things start to get IN pos with the different breaks. For that reason we removed the exclusion of IN from the dictionary lookup in another project using the bio detector: // This is the same as the default list except that "IN" is not excluded set

Re: Sentence splitter [EXTERNAL]

2018-03-13 Thread Tomasz Oliwa
Interesting, with the SentenceDetectorAnnotatorBIO the WordToken "aspirin" gets partOfSpeech = "IN", with the regular SentenceDetectorAnnotator it is "NN". Looks like you were right Tim, since IN stands for preposition or subordinating conjunction as defined at

Re: Sentence splitter [EXTERNAL]

2018-03-13 Thread Miller, Timothy
That sounds bizarre! I can think of two possibilities: a sentence break in the middle of the word (unlikely), or the different sentence splits caused the POS tagger some confusion, and tagged the word aspirin as a forbidden part of speech, like a preposition or something. If you check the token

Re: Sentence splitter [EXTERNAL]

2018-03-13 Thread Tomasz Oliwa
Hi, I tested SentenceDetectorAnnotatorBIO in cTAKES 4.0.0, simply by replacing SentenceDetectorAnnotator.xml with SentenceDetectorAnnotatorBIO.xml in AggregatePlaintextFastUMLSProcessor.xml. While it seemed to work, I noticed that in one example, an IdentifiedAnnotation was not found, that

Re: UmlsUserApprover Error [EXTERNAL]

2018-03-13 Thread Dligach, Dmitriy
Hi Sean and everybody, I just wanted to confirm that I intermittently run into the same issue. I was able to fix it yesterday by removing a bunch of files from /tmp (such as conn.xml, which dictionary lookup apparently creates there under my user name). However, today, the problem returned and

Re: UmlsUserApprover Error [EXTERNAL]

2018-03-13 Thread Andrew Phillips
Hi Sean, I looked into changing the relevant configs to non-checking, but didn't have much success. I am back to primarily trying to troubleshoot the original error. I have tried removing files from /tmp (especially conn.xml) as they seemed to be contributing to the issue. I also tried setting up

Ctakes throughput

2018-03-13 Thread Hannah Eyre
Hi, I am working on a project to extend ctakes for processing a large number of documents and I am looking into possible routes for improving ctakes performance. Is there any information detailing the 50,000 clinical notes per hour benchmark advertised on the ctakes homepage? I am looking for