I don't know of anyone that's done exactly what you're asking, but I
think it's a really interesting idea. My first thought was that you
could try the Finding typeID which would be one level less granular the
TUIs. But that covers many more TUIs:
T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184
that contains T184, but also the noisier T033 and T047, along with many
others! So that would make your problem worse.
Unfortunately it sounds like from what you're saying that the UMLS
doesn't have the granularity in the places that you need to represent
only the findings that you're interested in.
Are there any examples of the types of things that come up from T033 and
T047 that you aren't interested in? I'm wondering if there's a pattern
that you may be able to write rules to find so that you can
over-generate and then filter with those rules. Just throwing out a
simple idea.
Tim
Do you think if you moved to one level more abstract you would get too
much?
On 08/06/2013 11:47 AM, Bohne, Jacqueline R wrote:
We are trying to create a cTAKES process that will extract all
symptoms from our documents. In our first attempt, we used the UMLS
dictionary and pulled anything with a TUI of T184 (Sign or Symptom).
While this worked, we found that when we compared it to what our
Research Coordinators manually abstracted as symptoms, there were
quite a few differences. When we looked into these differences we
found a lot of the extra terms were considered either Findings (T033)
or Disease or Syndrome (T047) in UMLS. We would rather not just add
these TUIs to our NLP process because then we would end up with many
more terms than just symptoms in our results.
Has anyone else tried to create a database of symptoms using NLP? Or
are you aware of a better solution for creating a symptoms database?
Thank you for your time!
Thanks,
Jacquie Bohne
Research Programmer/Analyst
Marshfield Clinic
------------------------------------------------------------------------
The contents of this message may contain private, protected and/or
privileged information. If you received this message in error, you
should destroy the e-mail message and any attachments or copies, and
you are prohibited from retaining, distributing, disclosing or using
any information contained within. Please contact the sender and advise
of the erroneous delivery by return e-mail or telephone. Thank you for
your cooperation.