Hey, just wanted to say that I didn’t come around to make the component available yet, will do first thing next week!
Best, Erik > On 20. Feb 2019, at 19:47, Bonnie MacKellar <[email protected]> wrote: > > Hi, > Yes, we are using that format. I have a parser that I wrote, but it isn't > integrated into UIMA. It runs separately and loads the full clinical trial > data into a triplestore (Stardog). I would be interested in your system > since I am not really familiar with how to write file readers in the UMIA > framework. Perhaps I can merge my parser into it and end up with just the > right thing. If you can make it available, I would definitely be > interested. I will take a look at the other links as well. Thanks!! > > Bonnie MacKellar > > On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <[email protected]> > wrote: > >> Dear Bonnie, >> >> are you talking about the clinical trial XML format used by >> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance? >> If so, I did create a UIMA reader for these data. Its not perfect but >> perhaps enough for your purposes and also you might want to enhance it. >> Please let me know if you would be interested in that, I did not get >> around to make it publicly available yet but could do so quickly. >> >> To answer the general question to the best of my knowledge: >> There is no such thing as a general XML reader built-in into the UIMA >> framework. For all non-trivial formats, a specific reader is necessary. >> This also holds true with regard to the employed type system. >> That being said, there are UIMA readers that try to serve as a general XML >> reading facility, e.g. the “XML Reader” from our lab (JULIELab, >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader < >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>). >> However, in my experience XML inputs come in a lot of different forms >> which might often not be suitable to a generic approach which is why I >> wrote quite a few UIMA readers for specific XML formats in the past. >> >> Hope that helps, >> >> Erik >> >>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <[email protected]> >> wrote: >>> >>> This is probably a very naive question, but I can't seem to find anything >>> about this. I currently have a lot of XML files (clinical trial >>> descriptions). My current workflow is to run a preprocessor that parses >> the >>> XML and generates text files in a simple format. I then run these files >> in >>> a UIMA pipeline, using FileCollectionReader to load the text files, RUTA >> to >>> parse the simple format, the Metamap annotator to do some UMLS >> annotations, >>> and finally I have a writer that generates RDF triples from the UMIA >>> annotations and loads the triples into a database. This has worked but is >>> clunky, especially the preprocessing. I feel like there has to be a >> better >>> way. Is there any support for reading XML files or do I need to write my >>> own CollectionReader? Are there any other tools within UIMA for handling >>> XML text? >>> >>> thanks, >>> Bonnie MacKellar >> >>
