​Folks,

Wondering if there are any samples of using the Uima component Tika
FilesystemReader working with uimaFIT?

I've been playing around with it, getting several errors (probably my
fault) but can't appear to find a similar example on the website / mailing
list despite a  search. Have downloaded and compiled source (Uima, Uima
tools, examples); existing code is clear but when I try to combine them to
do the following outline I get errors.

Aim is to:
1)Read a collection of documents using the Uima component Tika
FilesystemReader
2)later - do more serious POS tagging.

The code for is:

    CollectionReader readerEngine =
CollectionReaderFactory.createCollectionReader(FileSystemCollectionReader.class,
                FileSystemCollectionReader.PARAM_INPUTDIR,
                "C:\\Somelocation",
                FileSystemCollectionReader.PARAM_ENCODING, "UTF-8",
                FileSystemCollectionReader.PARAM_LANGUAGE, "EN");

AggregateBuilder builder = new AggregateBuilder();

SimplePipeline.runPipeline(readerEngine, builder.createAggregate());

 And the error is
Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCas
type "org.apache.uima.examples.SourceDocumentInformation" used in Java
code,  but was not declared in the XML type descriptor.

Similar error referenced at link below, but not clear how to implement the
suggested fix
http://user.uima.apache.narkive.com/b940cOrO/how-to-test-a-collectionreader

Any suggestions or pointers on the web that I should be looking at?

Thanks for your help

Paul

Reply via email to