Hi Eli, Based on the sample code, I presume you are using uimaFIT to wire up the pipeline. InputStreamCollectionReader probably came from cleartk-utils. Where it came from is probably not so important though... CollectionReaders were designed to read in a collection of documents/batch processing (hence the examples have Files In a Directory example). If you are really looking to have dynamic text in some sort of real-time or SOA architecture, then you may want to take a look at creating the jCAS and setting the text on it? uimaFIT also has a good example of this [2]. Something like:
If it's batch processing, you may find Tim's bagofcui's example [1] helpful based on your example... [1] https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGenerator.java [2] http://svn.apache.org/repos/asf/uima/sandbox/uimafit/trunk/uimafit-examples/src/main/java/org/apache/uima/fit/examples/tutorial/ex1/RoomNumberAnnotatorPipeline.java Hope that helps... On Mon, Oct 21, 2013 at 4:22 PM, eli mizzou <[email protected]> wrote: > Hi cTAKES folks, > > I am trying to figure out how to run the Clinical Document Pipeline from > Java. I have a set of clinical documents as plain texts. I want to parse > these documents and extract a list of <doc_ID, CUI, freq> that is in > document *doc_ID*, there is *CUI* with frequency of *freq*. I spent > several days installing cTAKES and looking for a solution. I narrow it down > to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline > with a AnaylisisEngineDescription. Here is a part of the code: > > String documentText = "Text of document to test goes here, such as the > following. No edema, some soreness, denies pain."; InputStream inStream = > InputStreamCollectionReader.convertToByteArrayInputStream(documentText); > CollectionReader collectionReader = > InputStreamCollectionReader.getCollectionReader(inStream); > AnalysisEngineDescription pipelineIncludingUmlsDictionaries = > AnalysisEngineFactory.createAnalysisEngineDescription( > "desc/analysis_engine/AggregatePlaintextUMLSProcessor"); > AnalysisEngineDescription xWriter = > AnalysisEngineFactory.createPrimitiveDescription( XWriter.class, > XWriter.PARAM_OUTPUT_DIRECTORY_NAME, AssertionConst.evalOutputDir, > XWriter.PARAM_XML_SCHEME_NAME, XWriter.XMI, > XWriter.PARAM_FILE_NAMER_CLASS_NAME, CtakesFileNamer.class.getName()); > SimplePipeline.runPipeline(collectionReader, > pipelineIncludingUmlsDictionaries, xWriter); System.out.println("Done at " > + new Date()); > > The problem is it can not find "*InputStreamCollectionReader*". I > searched for it but no success so far! Would you please give me a hint or > show some directions? > > thanks for any help! > > -Eli >
