Re: XML files as input to UIMA?

Erik Fäßler Fri, 22 Feb 2019 04:04:12 -0800

Hey,

just wanted to say that I didn’t come around to make the component available 
yet, will do first thing next week!


Best,

Erik

> On 20. Feb 2019, at 19:47, Bonnie MacKellar <[email protected]> wrote:
> 
> Hi,
> Yes, we are using that format. I have a parser that I wrote, but it isn't
> integrated into UIMA. It runs separately and loads the full clinical trial
> data into a triplestore (Stardog). I would be interested in your system
> since I am not really familiar with how to write file readers in the UMIA
> framework. Perhaps I can merge my parser into it and end up with just the
> right thing. If you can make it available, I would definitely be
> interested.  I will take a look at the other links as well.  Thanks!!
> 
> Bonnie MacKellar
> 
> On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <[email protected]>
> wrote:
> 
>> Dear Bonnie,
>> 
>> are you talking about the clinical trial XML format used by
>> ClinicalTrials. <http://clinicaltrials.org/>gov by any chance?
>> If so, I did create a UIMA reader for these data. Its not perfect but
>> perhaps enough for your purposes and also you might want to enhance it.
>> Please let me know if you would be interested in that, I did not get
>> around to make it publicly available yet but could do so quickly.
>> 
>> To answer the general question to the best of my knowledge:
>> There is no such thing as a general XML reader built-in into the UIMA
>> framework. For all non-trivial formats, a specific reader is necessary.
>> This also holds true with regard to the employed type system.
>> That being said, there are UIMA readers that try to serve as a general XML
>> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader <
>> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>).
>> However, in my experience XML inputs come in a lot of different forms
>> which might often not be suitable to a generic approach which is why I
>> wrote quite a few UIMA readers for specific XML formats in the past.
>> 
>> Hope that helps,
>> 
>> Erik
>> 
>>> On 20. Feb 2019, at 01:13, Bonnie MacKellar <[email protected]>
>> wrote:
>>> 
>>> This is probably a very naive question, but I can't seem to find anything
>>> about this. I currently have a lot of XML files (clinical trial
>>> descriptions). My current workflow is to run a preprocessor that parses
>> the
>>> XML and generates text files in a simple format. I then run these files
>> in
>>> a UIMA pipeline, using FileCollectionReader to load the text files, RUTA
>> to
>>> parse the simple format, the Metamap annotator to do some UMLS
>> annotations,
>>> and finally I have a writer that generates RDF triples from the UMIA
>>> annotations and loads the triples into a database. This has worked but is
>>> clunky, especially the preprocessing. I feel like there has to be a
>> better
>>> way. Is there any support for reading XML files  or do I need to write my
>>> own CollectionReader? Are there any other tools within UIMA for handling
>>> XML text?
>>> 
>>> thanks,
>>> Bonnie MacKellar
>> 
>>

Re: XML files as input to UIMA?

Reply via email to