Richard Eckart de Castilho <richard.eckart@...> writes: > > You should take a look at the JCasIterable (cf. [1] - Example in Groovy, but > JCasIterable is a Java class and works nicely in Java too, just I have no > example in Java). > > JCasIterable basically allows you to iterate over the CASes produced by your > pipeline. In such a look, you can extract and collect the data you need from > the CASes, e.g. putting into a List<String> and returning it. Mind that you > should *not* try to keep hold of full CASes, FeatureStructure (including > Annotations and stuff). You need to copy the data from the CAS, otherwise > it will be corrupted.
Hi Richard, I was reading your reference for using JCasIterable (https://code.google.com/p/dkpro-core-asl/wiki/GroovyRecipies#OpenNLP_Part- of-speech_tagging_pipeline_using_JCasIterable_and_c), but i have some questions. Your example creates a JCasIterable using the following codes: def pipeline = new JCasIterable( createReaderDescription(TextReader, TextReader.PARAM_PATH, args[0], TextReader.PARAM_LANGUAGE, args[1], TextReader.PARAM_PATTERNS, ["[+]*.txt"]), createEngineDescription(OpenNlpSegmenter), createEngineDescription(OpenNlpPosTagger)); I assume that createReaderDescription(), createEngineDescription() are return CollectionReaderDescription and AnalysisEngineDescription respectively. But when I looked at the constructor for JCasIterable, it only accepts CollectionReader and AnalysisEngine array: JCasIterable(final CollectionReader aReader, final AnalysisEngine... aEngines) Why is this so?
