Hello, I am trying to use cTAKES to sectionize and NER within sections on MIMIC 
II notes. The RegexSectionizer seems to be the most applicable (there generally 
are not section dividers). However, the section labels in ccda_sections.txt are 
quite limited. For example, it does not detect "HPI" "FH" "PMH" or many other 
commonly used section identifiers. Other systems include much longer lists. For 
example,  VU's sectag 
https://orbit.nlm.nih.gov/browse-repository/software/nlp-information-extraction/negation-resolution/41-sectag
 provides ~6800 synonyms and variants of section labels. I would be happy to 
add them to  RegexSectionizer; however, it is not obvious how to generate the 
anticipated format. The sectag database contains a "tree" (eg 5.28 for "hpi") 
which is not the same as the HL7-CCDA ID. Some but not all have LOINC id's. 
Similarly, I am not sure what Clarity is using for its section tree.

1) Does it matter what string I used for the id (other than collisions)? Does 
anything later actually use the HL7 string?
2) If so, how does one generate the HL7 string?
3) Is there some easier way to do this with eg BsvRegexSectionizer (which is 
essentially undocumented)?

Thanks, Ryan King

________________________________
The materials in this message are private and may contain Protected Healthcare 
Information or other information of a sensitive nature. If you are not the 
intended recipient, be advised that any unauthorized use, disclosure, copying 
or the taking of any action in reliance on the contents of this information is 
strictly prohibited. If you have received this email in error, please 
immediately notify the sender via telephone or return mail.

Reply via email to