I didn't follow the thread closely so I may be wandering here - but I thought I would volunteer my working strategy for testing collection readers in Groovy even though it may be overly simplistic for many situations.
My unit tests for our collection readers start off with one line:

JCas jCas = TestsUtil.processCR("desc/test/myCRdesc.xml", 0)

followed immediately by assertions of what I expect to be in the JCas.

The method TestsUtil.processCR looks like this:

static JCas processCR(String descriptorFileName, int documentNumber)
   {
       XMLInputSource xmlInput = new XMLInputSource(new 
File("desc/annotators/EmptyAnnotator.xml"))
       ResourceSpecifier specifier = 
UIMAFramework.getXMLParser().parseResourceSpecifier(xmlInput)
       AnalysisEngine analysisEngine = 
UIMAFramework.produceAnalysisEngine(specifier)
       JCas jCas = analysisEngine.newJCas()
       xmlInput = new XMLInputSource(new File(descriptorFileName))
       specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(xmlInput)
CollectionReader collectionReader = UIMAFramework.produceCollectionReader(specifier)
       for(i in 0..documentNumber)
       {
           jCas.reset()
           collectionReader.getNext(jCas.getCas())
       }
       return jCas
   }



Where EmptyAnnotator.xml is a descriptor file for an analysis engine that does nothing as follows:

public class EmptyAnnotator extends JCasAnnotator_ImplBase{
   public void process(JCas jCas) throws AnalysisEngineProcessException    {
       //this annotator does nothing!
   }
}


I hope this is helpful.

Reply via email to