Hey Ted,

So I haven't quite figured out the MetaMap, but I have a set of diseases
that I mapped to CUIs another way. I'm still getting negative results with
diseases that I think should be "similar". For example:

./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD "C1864828"
"C3810041"

Default Settings:

  --default http://atlas.ahc.umn.edu/

  --measure path


User Settings:

  --rel PAR/CHD


["b'-1", 'ALZHEIMER DISEASE 10(C1864828)', "ALZHEIMER DISEASE
18(C3810041)\\n'"]

You can see my results on the last row. Could you advise- Would you expect
that these two CUIs would not be similar? I wanted to measure path as a
simple starting point, but could you recommend that another distance might
be more informative? Thanks again for your help!

On Mon, Jun 5, 2017 at 1:43 PM, Jennifer Wilson <jen.wilson...@gmail.com>
wrote:

> Hey Ted,
>
> Thanks for all of the help. I found the interactive interface really
> helpful and had been able to create inputs similar to what you shared. I
> have an open help ticket now on trying to get the file to download. He gave
> me some commands to try that I had already tried, so there must be
> something else to unzipping the code...
>
> Thanks again. Hopefully I'm close to a solution!
>
> On Mon, Jun 5, 2017 at 11:21 AM, Ted Pedersen duluth...@gmail.com
> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>
>>
>>
>> Hi Jen,
>>
>> Nothing to be embarrassed about at all!. If you haven't already used
>> MetaMap interactively you might want to try that before you attempt a local
>> install :
>>
>> https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml
>>
>> (You would need to be logged into UTS for the link to work I think...)
>>
>> Anyway, once at that site on the right side there are some links for
>> using MetaMap interactively. Below is an example of what that looks like
>> (where the first line is my input and the rest is the output). I turned on
>> the option to show CUIs, since I think that is your desire output...
>>
>> About the bz2 file, I think you'd need to uncompress that with bunzip2,
>> although I have not done a local install for a while so I am not 100
>> percent sure if that is the issue or not. But, I've cc'd the MetaMap help
>> line on this note, they are usually very good about following up on issues
>> like this.
>>
>> I hope this helps!
>> Ted
>>
>> Processing 00000000.tx.1: I have a really bad headache, and my joints ache.
>>
>> Phrase: I
>> >>>>> Phrase
>> i
>> <<<<< Phrase
>> >>>>> Mappings
>> Meta Mapping (1000):
>>   1000   C0021966:I- (Iodides) [Inorganic Chemical]
>> Meta Mapping (1000):
>>   1000   C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, or 
>> Protein,Immunologic Factor]
>> <<<<< Mappings
>>
>> Phrase: have
>> >>>>> Phrase
>> <<<<< Phrase
>>
>> Phrase: a really bad headache,
>> >>>>> Phrase
>> really bad headache
>> <<<<< Phrase
>> >>>>> Mappings
>> Meta Mapping (790):
>>    660   C0205169:Bad [Qualitative Concept]
>>    827   C0018681:HEADACHE (Headache) [Sign or Symptom]
>> <<<<< Mappings
>>
>> Phrase: and
>> >>>>> Phrase
>> <<<<< Phrase
>>
>> Phrase: my joints
>> >>>>> Phrase
>> joints
>> <<<<< Phrase
>> >>>>> Mappings
>> Meta Mapping (1000):
>>   1000   C0022417:Joints [Body Space or Junction]
>> Meta Mapping (1000):
>>   1000   C0392905:Joints (Articular system) [Body System]
>> <<<<< Mappings
>>
>> Phrase: ache.
>> >>>>> Phrase
>> ache
>> <<<<< Phrase
>> >>>>> Mappings
>> Meta Mapping (1000):
>>   1000   C0234238:ACHE (Ache) [Sign or Symptom]
>> <<<<< Mappings
>>
>>
>>
>> On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson jen.wilson...@gmail.com
>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>
>>>
>>>
>>> Hey Ted,
>>>
>>> I'm (embarrassingly) having some trouble navigating the NLM site. I
>>> think I have an account and am trying to download some of the MetaMap
>>> software (I think that the "Lite" version is sufficient). But when I
>>> download the bz2 file, it won't open because I think I need to authenticate
>>> it. Do you know how I'm supposed to access this software? Sorry if this is
>>> out of your realm, I can try someone else at NLM. This has just been a lot
>>> more difficult and confusing than I thought it should be! Thanks,
>>>
>>> On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen duluth...@gmail.com
>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>>
>>>>
>>>>
>>>> Hi Jennifer,
>>>>
>>>> Mapping terms to CUIs is it's own problem, and there are a few nice
>>>> tools already available that might be of some use. We've used MetaMap to
>>>> good effect for this problem, so you might  want to consider looking there.
>>>>
>>>> https://metamap.nlm.nih.gov/
>>>>
>>>> I'd be curious if other users have recommendations as well..
>>>>
>>>> Good luck,
>>>> Ted
>>>>
>>>> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson jen.wilson...@gmail.com
>>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi Ted,
>>>>>
>>>>> Thank you again for all of this. I'm sorry I had to put down this
>>>>> project for a few days and am only now getting back to it.
>>>>>
>>>>> I see that ontologies change and reproducing that result might not be
>>>>> the best sanity check on the scripts that I wrote.
>>>>>
>>>>> I'm going to try and figure out how to map to CUI terms and I'll be in
>>>>> touch if I get stuck again. Thanks,
>>>>>
>>>>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen duluth...@gmail.com
>>>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> This is perhaps a bit more than you were looking for, but there are
>>>>>> quite a few command line tools available with UMLS::Similarity when you
>>>>>> install locally that can be helpful for digging into situations like 
>>>>>> this.
>>>>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I
>>>>>> find that one of them does not have a path to the root, while the other
>>>>>> does (see command output below)
>>>>>>
>>>>>> The lack of a path to  the root is going to cause a lot of measures
>>>>>> to report a -1 value (since path, for example, relies on finding this 
>>>>>> path
>>>>>> as a part of its computation). In fact, not having a path to the root 
>>>>>> makes
>>>>>> me question if C0156543 is in MSH at all, so it might even be that the 
>>>>>> CUI
>>>>>> is no longer a part of MSH (and not just lacking a path to the root). 
>>>>>> But,
>>>>>> regardless, clearly something has changed since 2009 that is causing this
>>>>>> measure to return a different value. This happens in some cases since 
>>>>>> UMLS
>>>>>> continues to evolve and CUIs are added, removed, etc. It's important to
>>>>>> know what version of the UMLS a previous study has used if you are
>>>>>> interested in getting a very exact comparison. In the case of our AMIA 
>>>>>> 2009
>>>>>> paper we used 2008AB, so things have no doubt changed a bit since then.
>>>>>>
>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0156543
>>>>>>
>>>>>> UMLS-Interface Configuration Information:
>>>>>> (Default Information - no config file)
>>>>>>
>>>>>>   Sources (SAB):
>>>>>>      MSH
>>>>>>   Relations (REL):
>>>>>>      PAR
>>>>>>      CHD
>>>>>>
>>>>>>   Sources (SABDEF):
>>>>>>      UMLS_ALL
>>>>>>   Relations (RELDEF):
>>>>>>      UMLS_ALL
>>>>>>
>>>>>>
>>>>>> There are no paths from the given C0156543 to the root.
>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0000786
>>>>>>
>>>>>>
>>>>>> UMLS-Interface Configuration Information:
>>>>>> (Default Information - no config file)
>>>>>>
>>>>>>   Sources (SAB):
>>>>>>      MSH
>>>>>>   Relations (REL):
>>>>>>      PAR
>>>>>>      CHD
>>>>>>
>>>>>>   Sources (SABDEF):
>>>>>>      UMLS_ALL
>>>>>>   Relations (RELDEF):
>>>>>>      UMLS_ALL
>>>>>>
>>>>>>
>>>>>> The paths between abortions, spontaneous (C0000786) and the root:
>>>>>>   => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh
>>>>>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh
>>>>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 
>>>>>> (compl
>>>>>> pregn) C0000786 (abortions, spontaneous)
>>>>>>
>>>>>>
>>>>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <duluth...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Jennifer,
>>>>>>>
>>>>>>> Thanks for sharing this question. I think in general if you have a
>>>>>>> choice between using CUIs or terms with UMLS::Similarity, your best 
>>>>>>> option
>>>>>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity
>>>>>>> might pick a CUI associated with a sense of the term you aren't 
>>>>>>> intending.
>>>>>>> Also, if you misspell a term or don't specify it exactly correctly, 
>>>>>>> then it
>>>>>>> shows up as not found. One useful resource for replicating similarity
>>>>>>> measure studies (like the one you cite) is the following page which
>>>>>>> includes term mappings for several of the datasets we've worked with 
>>>>>>> over
>>>>>>> the years.
>>>>>>>
>>>>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html
>>>>>>>
>>>>>>> I will admit to being a little puzzled about the case of abortion -
>>>>>>> miscarriage. The paper you cite clearly reports a value based on MSH, 
>>>>>>> but
>>>>>>> as I try to run that query now I get a value of -1 (even when using the
>>>>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but 
>>>>>>> that
>>>>>>> somehow we are not able to compute some of the measures (a path length, 
>>>>>>> for
>>>>>>> example). This suggests that there is not a path between the two CUIs,
>>>>>>> which has something to do with the structure of UMLS/MSH.
>>>>>>>
>>>>>>> One quick and dirty way to see if a CUI is in MSH is to find the
>>>>>>> path length between a CUI and itself. If it is present in MSH, that 
>>>>>>> value
>>>>>>> will be 1. We see that for each of the CUIs used for abortion and
>>>>>>> miscarriage.
>>>>>>>
>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>>>>> --measure path --sab MSH C0156543 C0156543
>>>>>>> Default Settings:
>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>   --rel PAR/CHD
>>>>>>> User Settings:
>>>>>>>   --measure path
>>>>>>>
>>>>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion
>>>>>>> NOS(C0156543)
>>>>>>>
>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>>>>> --measure path --sab MSH C0000786 C0000786
>>>>>>> Default Settings:
>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>   --rel PAR/CHD
>>>>>>> User Settings:
>>>>>>>   --measure path
>>>>>>>
>>>>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786)
>>>>>>>
>>>>>>> However, when I try to find the path length between the two CUIs, I
>>>>>>> get -1. This suggests that the CUIs are not jointed by PAR/CHD
>>>>>>> relations...note that below you can see that the terms for the CUIs have
>>>>>>> been looked up, which shows us that MSH knows about them...
>>>>>>>
>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl
>>>>>>> --measure path --sab MSH C0156543 C0000786
>>>>>>> Default Settings:
>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>   --rel PAR/CHD
>>>>>>> User Settings:
>>>>>>>   --measure path
>>>>>>>
>>>>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont
>>>>>>> aneous(C0000786)
>>>>>>>
>>>>>>> So, in any case, it would appear that something has changed in the
>>>>>>> structure of MSH since we reported our results in the 2009 AMIA paper 
>>>>>>> you
>>>>>>> mention. I'm not sure what that is. But, I think the general message is
>>>>>>> that if you can use CUIs it will normally be more reliable to do that.
>>>>>>> Mapping terms to CUIs is of course it's own problem, but 
>>>>>>> UMLS::Similarity
>>>>>>> doesn't do anything terribly fancy with that, and so probably whatever 
>>>>>>> you
>>>>>>> do will be more extensive and reliable than what UMLS::Similarity would
>>>>>>> do...
>>>>>>>
>>>>>>> I hope this helps somehow, and please do feel free to follow up.
>>>>>>> Thoughts from other users on this issue would also be most welcome!
>>>>>>>
>>>>>>> Cordially,
>>>>>>> Ted
>>>>>>>
>>>>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson
>>>>>>> jen.wilson...@gmail.com [umls-similarity] <
>>>>>>> umls-similarity@yahoogroups.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm resending this now that I'm subscribed. Any advice would be
>>>>>>>> much appreciated! Thank you,
>>>>>>>>
>>>>>>>> ---------- Forwarded message ----------
>>>>>>>> From: Jennifer Wilson <jen.wilson...@gmail.com>
>>>>>>>> Date: Tue, May 23, 2017 at 6:13 PM
>>>>>>>> Subject: Help with the best approach for using the query-UMLS
>>>>>>>> interface
>>>>>>>> To: umls-similarity@yahoogroups.com
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello UMLS similarity team,
>>>>>>>>
>>>>>>>> I am trying to compute the similarity between ~30K
>>>>>>>> disease/phenotype terms. Ideally, I would have a matrix of similarity 
>>>>>>>> for
>>>>>>>> these terms.
>>>>>>>>
>>>>>>>> My first attempt was to write a python script to call the
>>>>>>>> query-umls-similarity-webinterface.pl script. Though, before
>>>>>>>> releasing the script on my dataset, I was trying to recreate the scores
>>>>>>>> from this paper (https://www.ncbi.nlm.nih.gov/
>>>>>>>> pmc/articles/PMC2815481/) in table 1.
>>>>>>>>
>>>>>>>> Here's the command I am using:
>>>>>>>>
>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>>>>> "Abortion" "Miscarriage"
>>>>>>>>
>>>>>>>> Default Settings:
>>>>>>>>
>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>
>>>>>>>>   --measure path
>>>>>>>>
>>>>>>>>
>>>>>>>> User Settings:
>>>>>>>>
>>>>>>>>   --rel PAR/CHD
>>>>>>>>
>>>>>>>>
>>>>>>>> (-1.0, 'Abortion', 'Miscarriage')
>>>>>>>>
>>>>>>>> I also have not processed the text in my dataset much. I have
>>>>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, 
>>>>>>>> and
>>>>>>>> the GWAS catalogue. If I'm using data from all of these sources - do 
>>>>>>>> you
>>>>>>>> recommend sending them directly to the query interface? Should I try 
>>>>>>>> and
>>>>>>>> map to CUI terms? (examples below)
>>>>>>>>
>>>>>>>> Before I download the database and attempt to query the database
>>>>>>>> (it's not a language that I use in my current work), I just wanted an
>>>>>>>> outside perspective to see if there are best practices for using this 
>>>>>>>> data.
>>>>>>>> Thank you in advance for your time!
>>>>>>>>
>>>>>>>> (examples)
>>>>>>>> Here are two more examples showing the disease descriptions in my
>>>>>>>> dataset. Is the UMLS interface robust to these various formats or do 
>>>>>>>> they
>>>>>>>> need to be an exact match?
>>>>>>>>
>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form"
>>>>>>>>
>>>>>>>> Default Settings:
>>>>>>>>
>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>
>>>>>>>>   --measure path
>>>>>>>>
>>>>>>>>
>>>>>>>> User Settings:
>>>>>>>>
>>>>>>>>   --rel PAR/CHD
>>>>>>>>
>>>>>>>>
>>>>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local
>>>>>>>> hypoplastic form')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD
>>>>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE"
>>>>>>>>
>>>>>>>> Default Settings:
>>>>>>>>
>>>>>>>>   --default http://atlas.ahc.umn.edu/
>>>>>>>>
>>>>>>>>   --measure path
>>>>>>>>
>>>>>>>>
>>>>>>>> User Settings:
>>>>>>>>
>>>>>>>>   --rel PAR/CHD
>>>>>>>>
>>>>>>>>
>>>>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED
>>>>>>>> AGGRESSIVE')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jennifer L. Wilson
>>>>>>>> Bioengineering, Stanford University
>>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jennifer L. Wilson
>>>>>>>> Bioengineering, Stanford University
>>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jennifer L. Wilson
>>>>> Bioengineering, Stanford University
>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>>> --
>>>>> Jennifer L. Wilson
>>>>> Bioengineering, Stanford University
>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Jennifer L. Wilson
>>> Bioengineering, Stanford University
>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>>>
>>>
>> 
>>
>
>
>
> --
> Jennifer L. Wilson
> Bioengineering, Stanford University
> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318>
>



-- 
Jennifer L. Wilson
Bioengineering, Stanford University
jen.wilson...@gmail.com / 703.969.3318
  • [umls-similarity... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
    • Re: [umls-s... Ted Pedersen duluth...@gmail.com [umls-similarity]
      • Re: [um... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re:... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
          • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]
                • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
                • ... Jennifer Wilson jen.wilson...@gmail.com [umls-similarity]

Reply via email to