Hi Jennifer,

Thanks for sharing this question. I think in general if you have a choice
between using CUIs or terms with UMLS::Similarity, your best option is to
use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity might
pick a CUI associated with a sense of the term you aren't intending. Also,
if you misspell a term or don't specify it exactly correctly, then it shows
up as not found. One useful resource for replicating similarity measure
studies (like the one you cite) is the following page which includes term
mappings for several of the datasets we've worked with over the years.

http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html

I will admit to being a little puzzled about the case of abortion -
miscarriage. The paper you cite clearly reports a value based on MSH, but
as I try to run that query now I get a value of -1 (even when using the
CUIs). However, it appears that each of the CUIs is found in MSH, but that
somehow we are not able to compute some of the measures (a path length, for
example). This suggests that there is not a path between the two CUIs,
which has something to do with the structure of UMLS/MSH.

One quick and dirty way to see if a CUI is in MSH is to find the path
length between a CUI and itself. If it is present in MSH, that value will
be 1. We see that for each of the CUIs used for abortion and miscarriage.

tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
path --sab MSH C0156543 C0156543
Default Settings:
  --default http://atlas.ahc.umn.edu/
  --rel PAR/CHD
User Settings:
  --measure path

1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion NOS(C0156543)

tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
path --sab MSH C0000786 C0000786
Default Settings:
  --default http://atlas.ahc.umn.edu/
  --rel PAR/CHD
User Settings:
  --measure path

1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)

However, when I try to find the path length between the two CUIs, I get -1.
This suggests that the CUIs are not jointed by PAR/CHD relations...note
that below you can see that the terms for the CUIs have been looked up,
which shows us that MSH knows about them...

tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl --measure
path --sab MSH C0156543 C0000786
Default Settings:
  --default http://atlas.ahc.umn.edu/
  --rel PAR/CHD
User Settings:
  --measure path

-1<>Unspecified abortion NOS(C0156543)<>Abortions.spontaneous(C0000786)

So, in any case, it would appear that something has changed in the
structure of MSH since we reported our results in the 2009 AMIA paper you
mention. I'm not sure what that is. But, I think the general message is
that if you can use CUIs it will normally be more reliable to do that.
Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity
doesn't do anything terribly fancy with that, and so probably whatever you
do will be more extensive and reliable than what UMLS::Similarity would
do...

I hope this helps somehow, and please do feel free to follow up. Thoughts
from other users on this issue would also be most welcome!

Cordially,
Ted

On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson jen.wilson...@gmail.com
[umls-similarity] <umls-similarity@yahoogroups.com> wrote:

>
>
> Hi all,
>
> I'm resending this now that I'm subscribed. Any advice would be much
> appreciated! Thank you,
>
> ---------- Forwarded message ----------
> From: Jennifer Wilson <jen.wilson...@gmail.com>
> Date: Tue, May 23, 2017 at 6:13 PM
> Subject: Help with the best approach for using the query-UMLS interface
> To: umls-similarity@yahoogroups.com
>
>
> Hello UMLS similarity team,
>
> I am trying to compute the similarity between ~30K disease/phenotype
> terms. Ideally, I would have a matrix of similarity for these terms.
>
> My first attempt was to write a python script to call the
> query-umls-similarity-webinterface.pl script. Though, before releasing
> the script on my dataset, I was trying to recreate the scores from this
> paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815481/) in table 1.
>
> Here's the command I am using:
>
> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
> "Abortion" "Miscarriage"
>
> Default Settings:
>
>   --default http://atlas.ahc.umn.edu/
>
>   --measure path
>
>
> User Settings:
>
>   --rel PAR/CHD
>
>
> (-1.0, 'Abortion', 'Miscarriage')
>
> I also have not processed the text in my dataset much. I have basically
> pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, and the GWAS
> catalogue. If I'm using data from all of these sources - do you recommend
> sending them directly to the query interface? Should I try and map to CUI
> terms? (examples below)
>
> Before I download the database and attempt to query the database (it's not
> a language that I use in my current work), I just wanted an outside
> perspective to see if there are best practices for using this data. Thank
> you in advance for your time!
>
> (examples)
> Here are two more examples showing the disease descriptions in my dataset.
> Is the UMLS interface robust to these various formats or do they need to be
> an exact match?
>
> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>
> Default Settings:
>
>   --default http://atlas.ahc.umn.edu/
>
>   --measure path
>
>
> User Settings:
>
>   --rel PAR/CHD
>
>
> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local hypoplastic
> form')
>
>
>
> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>
> Default Settings:
>
>   --default http://atlas.ahc.umn.edu/
>
>   --measure path
>
>
> User Settings:
>
>   --rel PAR/CHD
>
>
> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
> AGGRESSIVE')
>
>
>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>
>
>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>
> 
>
  • [umls-similarity... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
    • Re: [umls-s... Ted Pedersen duluth...@gmail.com [umls-similarity]
      • Re: [um... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re:... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
          • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]

Reply via email to