Hi Jennifer,

Mapping terms to CUIs is it's own problem, and there are a few nice tools
already available that might be of some use. We've used MetaMap to good
effect for this problem, so you might  want to consider looking there.


I'd be curious if other users have recommendations as well..

Good luck,

On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson jen.wilson...@gmail.com
[umls-similarity] <umls-similarity@yahoogroups.com> wrote:

> Hi Ted,
> Thank you again for all of this. I'm sorry I had to put down this project
> for a few days and am only now getting back to it.
> I see that ontologies change and reproducing that result might not be the
> best sanity check on the scripts that I wrote.
> I'm going to try and figure out how to map to CUI terms and I'll be in
> touch if I get stuck again. Thanks,
> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen duluth...@gmail.com
> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>> This is perhaps a bit more than you were looking for, but there are quite
>> a few command line tools available with UMLS::Similarity when you install
>> locally that can be helpful for digging into situations like this. When I
>> look for the path from each of these CUIs to the ROOT (of MSH) I find that
>> one of them does not have a path to the root, while the other does (see
>> command output below)
>> The lack of a path to  the root is going to cause a lot of measures to
>> report a -1 value (since path, for example, relies on finding this path as
>> a part of its computation). In fact, not having a path to the root makes me
>> question if C0156543 is in MSH at all, so it might even be that the CUI is
>> no longer a part of MSH (and not just lacking a path to the root). But,
>> regardless, clearly something has changed since 2009 that is causing this
>> measure to return a different value. This happens in some cases since UMLS
>> continues to evolve and CUIs are added, removed, etc. It's important to
>> know what version of the UMLS a previous study has used if you are
>> interested in getting a very exact comparison. In the case of our AMIA 2009
>> paper we used 2008AB, so things have no doubt changed a bit since then.
>> tpederse@maraca:~$ findPathToRoot.pl C0156543
>> UMLS-Interface Configuration Information:
>> (Default Information - no config file)
>>   Sources (SAB):
>>      MSH
>>   Relations (REL):
>>      PAR
>>      CHD
>>   Sources (SABDEF):
>>      UMLS_ALL
>>   Relations (RELDEF):
>>      UMLS_ALL
>> There are no paths from the given C0156543 to the root.
>> tpederse@maraca:~$ findPathToRoot.pl C0000786
>> UMLS-Interface Configuration Information:
>> (Default Information - no config file)
>>   Sources (SAB):
>>      MSH
>>   Relations (REL):
>>      PAR
>>      CHD
>>   Sources (SABDEF):
>>      UMLS_ALL
>>   Relations (RELDEF):
>>      UMLS_ALL
>> The paths between abortions, spontaneous (C0000786) and the root:
>>   => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh
>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh
>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl
>> pregn) C0000786 (abortions, spontaneous)
>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <duluth...@gmail.com>
>> wrote:
>>> Hi Jennifer,
>>> Thanks for sharing this question. I think in general if you have a
>>> choice between using CUIs or terms with UMLS::Similarity, your best option
>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity
>>> might pick a CUI associated with a sense of the term you aren't intending.
>>> Also, if you misspell a term or don't specify it exactly correctly, then it
>>> shows up as not found. One useful resource for replicating similarity
>>> measure studies (like the one you cite) is the following page which
>>> includes term mappings for several of the datasets we've worked with over
>>> the years.
>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>>> I will admit to being a little puzzled about the case of abortion -
>>> miscarriage. The paper you cite clearly reports a value based on MSH, but
>>> as I try to run that query now I get a value of -1 (even when using the
>>> CUIs). However, it appears that each of the CUIs is found in MSH, but that
>>> somehow we are not able to compute some of the measures (a path length, for
>>> example). This suggests that there is not a path between the two CUIs,
>>> which has something to do with the structure of UMLS/MSH.
>>> One quick and dirty way to see if a CUI is in MSH is to find the path
>>> length between a CUI and itself. If it is present in MSH, that value will
>>> be 1. We see that for each of the CUIs used for abortion and miscarriage.
>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>>> path --sab MSH C0156543 C0156543
>>> Default Settings:
>>>   --default http://atlas.ahc.umn.edu/
>>>   --rel PAR/CHD
>>> User Settings:
>>>   --measure path
>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion NOS(C0156543)
>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>>> path --sab MSH C0000786 C0000786
>>> Default Settings:
>>>   --default http://atlas.ahc.umn.edu/
>>>   --rel PAR/CHD
>>> User Settings:
>>>   --measure path
>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>>> However, when I try to find the path length between the two CUIs, I get
>>> -1. This suggests that the CUIs are not jointed by PAR/CHD relations...note
>>> that below you can see that the terms for the CUIs have been looked up,
>>> which shows us that MSH knows about them...
>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>>> path --sab MSH C0156543 C0000786
>>> Default Settings:
>>>   --default http://atlas.ahc.umn.edu/
>>>   --rel PAR/CHD
>>> User Settings:
>>>   --measure path
>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786)
>>> So, in any case, it would appear that something has changed in the
>>> structure of MSH since we reported our results in the 2009 AMIA paper you
>>> mention. I'm not sure what that is. But, I think the general message is
>>> that if you can use CUIs it will normally be more reliable to do that.
>>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity
>>> doesn't do anything terribly fancy with that, and so probably whatever you
>>> do will be more extensive and reliable than what UMLS::Similarity would
>>> do...
>>> I hope this helps somehow, and please do feel free to follow up.
>>> Thoughts from other users on this issue would also be most welcome!
>>> Cordially,
>>> Ted
>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson
>>> jen.wilson...@gmail.com [umls-similarity] <umls-similarity@yahoogroups.
>>> com> wrote:
>>>> Hi all,
>>>> I'm resending this now that I'm subscribed. Any advice would be much
>>>> appreciated! Thank you,
>>>> ---------- Forwarded message ----------
>>>> From: Jennifer Wilson <jen.wilson...@gmail.com>
>>>> Date: Tue, May 23, 2017 at 6:13 PM
>>>> Subject: Help with the best approach for using the query-UMLS interface
>>>> To: umls-similarity@yahoogroups.com
>>>> Hello UMLS similarity team,
>>>> I am trying to compute the similarity between ~30K disease/phenotype
>>>> terms. Ideally, I would have a matrix of similarity for these terms.
>>>> My first attempt was to write a python script to call the
>>>> query-umls-similarity-webinterface.pl script. Though, before releasing
>>>> the script on my dataset, I was trying to recreate the scores from this
>>>> paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/) in table
>>>> 1.
>>>> Here's the command I am using:
>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>> "Abortion" "Miscarriage"
>>>> Default Settings:
>>>>   --default http://atlas.ahc.umn.edu/
>>>>   --measure path
>>>> User Settings:
>>>>   --rel PAR/CHD
>>>> (-1.0, 'Abortion', 'Miscarriage')
>>>> I also have not processed the text in my dataset much. I have basically
>>>> pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and the GWAS
>>>> catalogue. If I'm using data from all of these sources - do you recommend
>>>> sending them directly to the query interface? Should I try and map to CUI
>>>> terms? (examples below)
>>>> Before I download the database and attempt to query the database (it's
>>>> not a language that I use in my current work), I just wanted an outside
>>>> perspective to see if there are best practices for using this data. Thank
>>>> you in advance for your time!
>>>> (examples)
>>>> Here are two more examples showing the disease descriptions in my
>>>> dataset. Is the UMLS interface robust to these various formats or do they
>>>> need to be an exact match?
>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>>>> Default Settings:
>>>>   --default http://atlas.ahc.umn.edu/
>>>>   --measure path
>>>> User Settings:
>>>>   --rel PAR/CHD
>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local
>>>> hypoplastic form')
>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>>> Default Settings:
>>>>   --default http://atlas.ahc.umn.edu/
>>>>   --measure path
>>>> User Settings:
>>>>   --rel PAR/CHD
>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>>>> --
>>>> Jennifer L. Wilson
>>>> Bioengineering, Stanford University
>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>> --
>>>> Jennifer L. Wilson
>>>> Bioengineering, Stanford University
>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
  • [umls-similarity... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
    • Re: [umls-s... Ted Pedersen duluth...@gmail.com [umls-similarity]
      • Re: [um... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re:... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
          • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]

Reply via email to