This is probably a very naive question, but I can't seem to find anything about this. I currently have a lot of XML files (clinical trial descriptions). My current workflow is to run a preprocessor that parses the XML and generates text files in a simple format. I then run these files in a UIMA pipeline, using FileCollectionReader to load the text files, RUTA to parse the simple format, the Metamap annotator to do some UMLS annotations, and finally I have a writer that generates RDF triples from the UMIA annotations and loads the triples into a database. This has worked but is clunky, especially the preprocessing. I feel like there has to be a better way. Is there any support for reading XML files or do I need to write my own CollectionReader? Are there any other tools within UIMA for handling XML text?
thanks, Bonnie MacKellar
