Hi Ted,

Thank you again for all of this. I'm sorry I had to put down this project
for a few days and am only now getting back to it.

I see that ontologies change and reproducing that result might not be the
best sanity check on the scripts that I wrote.

I'm going to try and figure out how to map to CUI terms and I'll be in
touch if I get stuck again. Thanks,

On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen duluth...@gmail.com
[umls-similarity] <umls-similarity@yahoogroups.com> wrote:

>
>
> This is perhaps a bit more than you were looking for, but there are quite
> a few command line tools available with UMLS::Similarity when you install
> locally that can be helpful for digging into situations like this. When I
> look for the path from each of these CUIs to the ROOT (of MSH) I find that
> one of them does not have a path to the root, while the other does (see
> command output below)
>
> The lack of a path to  the root is going to cause a lot of measures to
> report a -1 value (since path, for example, relies on finding this path as
> a part of its computation). In fact, not having a path to the root makes me
> question if C0156543 is in MSH at all, so it might even be that the CUI is
> no longer a part of MSH (and not just lacking a path to the root). But,
> regardless, clearly something has changed since 2009 that is causing this
> measure to return a different value. This happens in some cases since UMLS
> continues to evolve and CUIs are added, removed, etc. It's important to
> know what version of the UMLS a previous study has used if you are
> interested in getting a very exact comparison. In the case of our AMIA 2009
> paper we used 2008AB, so things have no doubt changed a bit since then.
>
> tpederse@maraca:~$ findPathToRoot.pl C0156543
>
> UMLS-Interface Configuration Information:
> (Default Information - no config file)
>
>   Sources (SAB):
>      MSH
>   Relations (REL):
>      PAR
>      CHD
>
>   Sources (SABDEF):
>      UMLS_ALL
>   Relations (RELDEF):
>      UMLS_ALL
>
>
> There are no paths from the given C0156543 to the root.
> tpederse@maraca:~$ findPathToRoot.pl C0000786
>
>
> UMLS-Interface Configuration Information:
> (Default Information - no config file)
>
>   Sources (SAB):
>      MSH
>   Relations (REL):
>      PAR
>      CHD
>
>   Sources (SABDEF):
>      UMLS_ALL
>   Relations (RELDEF):
>      UMLS_ALL
>
>
> The paths between abortions, spontaneous (C0000786) and the root:
>   => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh
> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh
> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 (compl
> pregn) C0000786 (abortions, spontaneous)
>
>
> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <duluth...@gmail.com>
> wrote:
>
>> Hi Jennifer,
>>
>> Thanks for sharing this question. I think in general if you have a choice
>> between using CUIs or terms with UMLS::Similarity, your best option is to
>> use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity might
>> pick a CUI associated with a sense of the term you aren't intending. Also,
>> if you misspell a term or don't specify it exactly correctly, then it shows
>> up as not found. One useful resource for replicating similarity measure
>> studies (like the one you cite) is the following page which includes term
>> mappings for several of the datasets we've worked with over the years.
>>
>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>>
>> I will admit to being a little puzzled about the case of abortion -
>> miscarriage. The paper you cite clearly reports a value based on MSH, but
>> as I try to run that query now I get a value of -1 (even when using the
>> CUIs). However, it appears that each of the CUIs is found in MSH, but that
>> somehow we are not able to compute some of the measures (a path length, for
>> example). This suggests that there is not a path between the two CUIs,
>> which has something to do with the structure of UMLS/MSH.
>>
>> One quick and dirty way to see if a CUI is in MSH is to find the path
>> length between a CUI and itself. If it is present in MSH, that value will
>> be 1. We see that for each of the CUIs used for abortion and miscarriage.
>>
>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>> path --sab MSH C0156543 C0156543
>> Default Settings:
>>   --default http://atlas.ahc.umn.edu/
>>   --rel PAR/CHD
>> User Settings:
>>   --measure path
>>
>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion NOS(C0156543)
>>
>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>> path --sab MSH C0000786 C0000786
>> Default Settings:
>>   --default http://atlas.ahc.umn.edu/
>>   --rel PAR/CHD
>> User Settings:
>>   --measure path
>>
>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>>
>> However, when I try to find the path length between the two CUIs, I get
>> -1. This suggests that the CUIs are not jointed by PAR/CHD relations...note
>> that below you can see that the terms for the CUIs have been looked up,
>> which shows us that MSH knows about them...
>>
>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
>> path --sab MSH C0156543 C0000786
>> Default Settings:
>>   --default http://atlas.ahc.umn.edu/
>>   --rel PAR/CHD
>> User Settings:
>>   --measure path
>>
>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786)
>>
>> So, in any case, it would appear that something has changed in the
>> structure of MSH since we reported our results in the 2009 AMIA paper you
>> mention. I'm not sure what that is. But, I think the general message is
>> that if you can use CUIs it will normally be more reliable to do that.
>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity
>> doesn't do anything terribly fancy with that, and so probably whatever you
>> do will be more extensive and reliable than what UMLS::Similarity would
>> do...
>>
>> I hope this helps somehow, and please do feel free to follow up. Thoughts
>> from other users on this issue would also be most welcome!
>>
>> Cordially,
>> Ted
>>
>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson jen.wilson...@gmail.com
>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>
>>>
>>>
>>> Hi all,
>>>
>>> I'm resending this now that I'm subscribed. Any advice would be much
>>> appreciated! Thank you,
>>>
>>> ---------- Forwarded message ----------
>>> From: Jennifer Wilson <jen.wilson...@gmail.com>
>>> Date: Tue, May 23, 2017 at 6:13 PM
>>> Subject: Help with the best approach for using the query-UMLS interface
>>> To: umls-similarity@yahoogroups.com
>>>
>>>
>>> Hello UMLS similarity team,
>>>
>>> I am trying to compute the similarity between ~30K disease/phenotype
>>> terms. Ideally, I would have a matrix of similarity for these terms.
>>>
>>> My first attempt was to write a python script to call the
>>> query-umls-similarity-webinterface.pl script. Though, before releasing
>>> the script on my dataset, I was trying to recreate the scores from this
>>> paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/) in table
>>> 1.
>>>
>>> Here's the command I am using:
>>>
>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>> "Abortion" "Miscarriage"
>>>
>>> Default Settings:
>>>
>>>   --default http://atlas.ahc.umn.edu/
>>>
>>>   --measure path
>>>
>>>
>>> User Settings:
>>>
>>>   --rel PAR/CHD
>>>
>>>
>>> (-1.0, 'Abortion', 'Miscarriage')
>>>
>>> I also have not processed the text in my dataset much. I have basically
>>> pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and the GWAS
>>> catalogue. If I'm using data from all of these sources - do you recommend
>>> sending them directly to the query interface? Should I try and map to CUI
>>> terms? (examples below)
>>>
>>> Before I download the database and attempt to query the database (it's
>>> not a language that I use in my current work), I just wanted an outside
>>> perspective to see if there are best practices for using this data. Thank
>>> you in advance for your time!
>>>
>>> (examples)
>>> Here are two more examples showing the disease descriptions in my
>>> dataset. Is the UMLS interface robust to these various formats or do they
>>> need to be an exact match?
>>>
>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>>>
>>> Default Settings:
>>>
>>>   --default http://atlas.ahc.umn.edu/
>>>
>>>   --measure path
>>>
>>>
>>> User Settings:
>>>
>>>   --rel PAR/CHD
>>>
>>>
>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local
>>> hypoplastic form')
>>>
>>>
>>>
>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>>
>>> Default Settings:
>>>
>>>   --default http://atlas.ahc.umn.edu/
>>>
>>>   --measure path
>>>
>>>
>>> User Settings:
>>>
>>>   --rel PAR/CHD
>>>
>>>
>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>>> AGGRESSIVE')
>>>
>>>
>>>
>>> --
>>> Jennifer L. Wilson
>>> Bioengineering, Stanford University
>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>
>>>
>>>
>>> --
>>> Jennifer L. Wilson
>>> Bioengineering, Stanford University
>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>
>>>
>>
> 
>



-- 
Jennifer L. Wilson
Bioengineering, Stanford University
jen.wilson...@gmail.com / 703.969.3318
-- 
Jennifer L. Wilson
Bioengineering, Stanford University
jen.wilson...@gmail.com / 703.969.3318
  • [umls-similarity... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
    • Re: [umls-s... Ted Pedersen duluth...@gmail.com [umls-similarity]
      • Re: [um... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re:... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
          • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]

Reply via email to