Just fyi, a lot of things start to get IN pos with the different breaks. For
that reason we removed the exclusion of IN from the dictionary lookup in
another project using the bio detector:
// This is the same as the default list except that "IN" is not excluded
set
Interesting, with the SentenceDetectorAnnotatorBIO the WordToken "aspirin" gets
partOfSpeech = "IN", with the regular SentenceDetectorAnnotator it is "NN".
Looks like you were right Tim, since IN stands for preposition or subordinating
conjunction as defined at
That sounds bizarre! I can think of two possibilities: a sentence break in the
middle of the word (unlikely), or the different sentence splits caused the POS
tagger some confusion, and tagged the word aspirin as a forbidden part of
speech, like a preposition or something. If you check the token
Hi,
I tested SentenceDetectorAnnotatorBIO in cTAKES 4.0.0, simply by replacing
SentenceDetectorAnnotator.xml with SentenceDetectorAnnotatorBIO.xml in
AggregatePlaintextFastUMLSProcessor.xml.
While it seemed to work, I noticed that in one example, an IdentifiedAnnotation
was not found, that
Hi Sean and everybody,
I just wanted to confirm that I intermittently run into the same issue. I was
able to fix it yesterday by removing a bunch of files from /tmp (such as
conn.xml, which dictionary lookup apparently creates there under my user name).
However, today, the problem returned and
Hi Sean,
I looked into changing the relevant configs to non-checking, but didn't
have much success. I am back to primarily trying to troubleshoot the
original error. I have tried removing files from /tmp (especially conn.xml)
as they seemed to be contributing to the issue. I also tried setting up
Hi,
I am working on a project to extend ctakes for processing a large number of
documents and I am looking into possible routes for improving ctakes
performance.
Is there any information detailing the 50,000 clinical notes per hour benchmark
advertised on the ctakes homepage? I am looking for