Hi Jennifer,

Mapping terms to CUIs is it's own problem, and there are a few nice tools
already available that might be of some use. We've used MetaMap to good
effect for this problem, so you might  want to consider looking there.

https://metamap.nlm.nih.gov/

I'd be curious if other users have recommendations as well..

Good luck,
Ted

On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson jen.wilson...@gmail.com
[umls-similarity] <umls-similarity@yahoogroups.com> wrote:

>
>
> Hi Ted,
>
> Thank you again for all of this. I'm sorry I had to put down this project
> for a few days and am only now getting back to it.
>
> I see that ontologies change and reproducing that result might not be the
> best sanity check on the scripts that I wrote.
>
> I'm going to try and figure out how to map to CUI terms and I'll be in
> touch if I get stuck again. Thanks,
>
> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen duluth...@gmail.com
> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>
>>
>>
>> This is perhaps a bit more than you were looking for, but there are quite
>> a few command line tools available with UMLS::Similarity when you install
>> locally that can be helpful for digging into situations like this. When I
>> look for the path from each of these CUIs to the ROOT (of MSH) I find that
>> one of them does not have a path to the root, while the other does (see
>> command output below)
>>
>> The lack of a path to  the root is going to cause a lot of measures to
>> report a -1 value (since path, for example, relies on finding this path as
>> a part of its computation). In fact, not having a path to the root makes me
>> question if C0156543 is in MSH at all, so it might even be that the CUI is
>> no longer a part of MSH (and not just lacking a path to the root). But,
>> regardless, clearly something has changed since 2009 that is causing this
>> measure to return a different value. This happens in some cases since UMLS
>> continues to evolve and CUIs are added, removed, etc. It's important to
>> know what version of the UMLS a previous study has used if you are
>> interested in getting a very exact comparison. In the case of our AMIA 2009
>> paper we used 2008AB, so things have no doubt changed a bit since then.
>>
>> tpederse@maraca:~$ findPathToRoot.pl C0156543
>>
>> UMLS-Interface Configuration Information:
>> (Default Information - no config file)
>>
>>   Sources (SAB):
>>      MSH
>>   Relations (REL):
>>      PAR
>>      CHD
>>
>>   Sources (SABDEF):
>>      UMLS_ALL
>>   Relations (RELDEF):
>>      UMLS_ALL
>>
>>
>> There are no paths from the given C0156543 to the root.
>> tpederse@maraca:~$ findPathToRoot.pl C0000786
>>
>>
>> UMLS-Interface Configuration Information:
>> (Default Information - no config file)
>>
>>   Sources (SAB):
>>      MSH
>>   Relations (REL):
>>      PAR
>>      CHD
>>
>>   Sources (SABDEF):
>>      UMLS_ALL
>>   Relations (RELDEF):
>>      UMLS_ALL
>>
>>
>> The paths between abortions, spontaneous (C0000786) and the root:
>>   => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh
>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh
>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl
>> pregn) C0000786 (abortions, spontaneous)
>>
>>
>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <duluth...@gmail.com>
>> wrote:
>>
>>> Hi Jennifer,
>>>
>>> Thanks for sharing this question. I think in general if you have a
>>> choice between using CUIs or terms with UMLS::Similarity, your best option
>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity
>>> might pick a CUI associated with a sense of the term you aren't intending.
>>> Also, if you misspell a term or don't specify it exactly correctly, then it
>>> shows up as not found. One useful resource for replicating similarity
>>> measure studies (like the one you cite) is the following page which
>>> includes term mappings for several of the datasets we've worked with over
>>> the years.
>>>
>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>>>
>>> I will admit to being a little puzzled about the case of abortion -
>>> miscarriage. The paper you cite clearly reports a value based on MSH, but
>>> as I try to run that query now I get a value of -1 (even when using the
>>> CUIs). However, it appears that each of the CUIs is found in MSH, but that
>>> somehow we are not able to compute some of the measures (a path length, for
>>> example). This suggests that there is not a path between the two CUIs,
>>> which has something to do with the structure of UMLS/MSH.
>>>
>>> One quick and dirty way to see if a CUI is in MSH is to find the path
>>> length between a CUI and itself. If it is present in MSH, that value will
>>> be 1. We see that for each of the CUIs used for abortion and miscarriage.
>>>
>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>>> path --sab MSH C0156543 C0156543
>>> Default Settings:
>>>   --default http://atlas.ahc.umn.edu/
>>>   --rel PAR/CHD
>>> User Settings:
>>>   --measure path
>>>
>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion NOS(C0156543)
>>>
>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>>> path --sab MSH C0000786 C0000786
>>> Default Settings:
>>>   --default http://atlas.ahc.umn.edu/
>>>   --rel PAR/CHD
>>> User Settings:
>>>   --measure path
>>>
>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>>>
>>> However, when I try to find the path length between the two CUIs, I get
>>> -1. This suggests that the CUIs are not jointed by PAR/CHD relations...note
>>> that below you can see that the terms for the CUIs have been looked up,
>>> which shows us that MSH knows about them...
>>>
>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>>> path --sab MSH C0156543 C0000786
>>> Default Settings:
>>>   --default http://atlas.ahc.umn.edu/
>>>   --rel PAR/CHD
>>> User Settings:
>>>   --measure path
>>>
>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786)
>>>
>>> So, in any case, it would appear that something has changed in the
>>> structure of MSH since we reported our results in the 2009 AMIA paper you
>>> mention. I'm not sure what that is. But, I think the general message is
>>> that if you can use CUIs it will normally be more reliable to do that.
>>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity
>>> doesn't do anything terribly fancy with that, and so probably whatever you
>>> do will be more extensive and reliable than what UMLS::Similarity would
>>> do...
>>>
>>> I hope this helps somehow, and please do feel free to follow up.
>>> Thoughts from other users on this issue would also be most welcome!
>>>
>>> Cordially,
>>> Ted
>>>
>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson
>>> jen.wilson...@gmail.com [umls-similarity] <umls-similarity@yahoogroups.
>>> com> wrote:
>>>
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I'm resending this now that I'm subscribed. Any advice would be much
>>>> appreciated! Thank you,
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Jennifer Wilson <jen.wilson...@gmail.com>
>>>> Date: Tue, May 23, 2017 at 6:13 PM
>>>> Subject: Help with the best approach for using the query-UMLS interface
>>>> To: umls-similarity@yahoogroups.com
>>>>
>>>>
>>>> Hello UMLS similarity team,
>>>>
>>>> I am trying to compute the similarity between ~30K disease/phenotype
>>>> terms. Ideally, I would have a matrix of similarity for these terms.
>>>>
>>>> My first attempt was to write a python script to call the
>>>> query-umls-similarity-webinterface.pl script. Though, before releasing
>>>> the script on my dataset, I was trying to recreate the scores from this
>>>> paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/) in table
>>>> 1.
>>>>
>>>> Here's the command I am using:
>>>>
>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>> "Abortion" "Miscarriage"
>>>>
>>>> Default Settings:
>>>>
>>>>   --default http://atlas.ahc.umn.edu/
>>>>
>>>>   --measure path
>>>>
>>>>
>>>> User Settings:
>>>>
>>>>   --rel PAR/CHD
>>>>
>>>>
>>>> (-1.0, 'Abortion', 'Miscarriage')
>>>>
>>>> I also have not processed the text in my dataset much. I have basically
>>>> pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and the GWAS
>>>> catalogue. If I'm using data from all of these sources - do you recommend
>>>> sending them directly to the query interface? Should I try and map to CUI
>>>> terms? (examples below)
>>>>
>>>> Before I download the database and attempt to query the database (it's
>>>> not a language that I use in my current work), I just wanted an outside
>>>> perspective to see if there are best practices for using this data. Thank
>>>> you in advance for your time!
>>>>
>>>> (examples)
>>>> Here are two more examples showing the disease descriptions in my
>>>> dataset. Is the UMLS interface robust to these various formats or do they
>>>> need to be an exact match?
>>>>
>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>>>>
>>>> Default Settings:
>>>>
>>>>   --default http://atlas.ahc.umn.edu/
>>>>
>>>>   --measure path
>>>>
>>>>
>>>> User Settings:
>>>>
>>>>   --rel PAR/CHD
>>>>
>>>>
>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local
>>>> hypoplastic form')
>>>>
>>>>
>>>>
>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>>>
>>>> Default Settings:
>>>>
>>>>   --default http://atlas.ahc.umn.edu/
>>>>
>>>>   --measure path
>>>>
>>>>
>>>> User Settings:
>>>>
>>>>   --rel PAR/CHD
>>>>
>>>>
>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>>>> AGGRESSIVE')
>>>>
>>>>
>>>>
>>>> --
>>>> Jennifer L. Wilson
>>>> Bioengineering, Stanford University
>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>>
>>>>
>>>>
>>>> --
>>>> Jennifer L. Wilson
>>>> Bioengineering, Stanford University
>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>>
>>>>
>>>
>>
>
>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>
> 
>
  • [umls-similarity... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
    • Re: [umls-s... Ted Pedersen duluth...@gmail.com [umls-similarity]
      • Re: [um... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re:... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
          • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]

Reply via email to