I am attempting to run the default FastPipeline to extract various features
from clinical text. One of the features I'd like to capture is the covered
text. However, when running the below scala code, calling getOriginalText
yields a "null" value for every annotation of type IdentifiedAnnotation. Is
this by design?

And if so, what would be a better way to extract the covered text? The
other features I need (subject, polarity, confidence, historyOf, and
snomed/CUI/TUI/PreferredText) I can acquire just fine. Effectively, the
goal here is to capture every identified annotation, relevant metadata, and
the original text (only showing my attempt at getting the covered text

def main(args: Array[String]) {
    val note =
       ... (Some long  example note.)
    val aed = ClinicalPipelineFactory.getDefaultPipeline
    val ae = AnalysisEngineFactory.createEngine(aed)
    val jcas =

    val index = jcas.getAnnotationIndex(IdentifiedAnnotation.`type`)
    val iter = index.iterator()
    while (iter.hasNext) {
      val annotation = iter.next().asInstanceOf[IdentifiedAnnotation]
      val fsArray = annotation.getOriginalText()
      if (fsArray != null) {
        for (featureStructure <- fsArray.toArray()) {
          val featureArray = featureStructure.getType().getFeatures()
          val strings = featureArray.map(x =>


Mike Trepanier
[image: MetiStream Logo - 500]
Mike Trepanier| Big Data Engineer | MetiStream, Inc. |  m...@metistream.com |
845 - 270 - 3129 (m) | www.metistream.com

Reply via email to