There was some domain specific data already used in creating the POS and 
chunking models

For info on the chunker, see
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Chunker

Tokenization is rule-based within Apache cTAKES.
The default tokenizer is described here
http://ctakes.apache.org/apidocs/3.1.1/ctakes-core/org/apache/ctakes/core/nlp/tokenizer/TokenizerPTB.html


-- James

________________________________
From: Bala Krishnan [[email protected]]
Sent: Friday, October 31, 2014 2:25 AM
To: [email protected]
Subject: cTakes chunking problem.

Hi,

I have just have couple of clarifications. cTakes uses various NLP open source 
libraries for sentence tokenization, pos tagging and chunking. Can anyone tell 
me what is the trained model used for pos tagging, chunking ? Is it based on 
Genia corpus. I tried using genia tagger but it is giving me different results 
from the cTakes. Can anyone suggest me some ideas on incorporating domain 
specific corpora for tagging and chunking in cTakes ?

Regards,
Prasanna

Reply via email to