Hey Ted, So I haven't quite figured out the MetaMap, but I have a set of diseases that I mapped to CUIs another way. I'm still getting negative results with diseases that I think should be "similar". For example:
./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD "C1864828" "C3810041" Default Settings: --default http://atlas.ahc.umn.edu/ --measure path User Settings: --rel PAR/CHD ["b'-1", 'ALZHEIMER DISEASE 10(C1864828)', "ALZHEIMER DISEASE 18(C3810041)\\n'"] You can see my results on the last row. Could you advise- Would you expect that these two CUIs would not be similar? I wanted to measure path as a simple starting point, but could you recommend that another distance might be more informative? Thanks again for your help! On Mon, Jun 5, 2017 at 1:43 PM, Jennifer Wilson <jen.wilson...@gmail.com> wrote: > Hey Ted, > > Thanks for all of the help. I found the interactive interface really > helpful and had been able to create inputs similar to what you shared. I > have an open help ticket now on trying to get the file to download. He gave > me some commands to try that I had already tried, so there must be > something else to unzipping the code... > > Thanks again. Hopefully I'm close to a solution! > > On Mon, Jun 5, 2017 at 11:21 AM, Ted Pedersen duluth...@gmail.com > [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > >> >> >> Hi Jen, >> >> Nothing to be embarrassed about at all!. If you haven't already used >> MetaMap interactively you might want to try that before you attempt a local >> install : >> >> https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml >> >> (You would need to be logged into UTS for the link to work I think...) >> >> Anyway, once at that site on the right side there are some links for >> using MetaMap interactively. Below is an example of what that looks like >> (where the first line is my input and the rest is the output). I turned on >> the option to show CUIs, since I think that is your desire output... >> >> About the bz2 file, I think you'd need to uncompress that with bunzip2, >> although I have not done a local install for a while so I am not 100 >> percent sure if that is the issue or not. But, I've cc'd the MetaMap help >> line on this note, they are usually very good about following up on issues >> like this. >> >> I hope this helps! >> Ted >> >> Processing 00000000.tx.1: I have a really bad headache, and my joints ache. >> >> Phrase: I >> >>>>> Phrase >> i >> <<<<< Phrase >> >>>>> Mappings >> Meta Mapping (1000): >> 1000 C0021966:I- (Iodides) [Inorganic Chemical] >> Meta Mapping (1000): >> 1000 C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, or >> Protein,Immunologic Factor] >> <<<<< Mappings >> >> Phrase: have >> >>>>> Phrase >> <<<<< Phrase >> >> Phrase: a really bad headache, >> >>>>> Phrase >> really bad headache >> <<<<< Phrase >> >>>>> Mappings >> Meta Mapping (790): >> 660 C0205169:Bad [Qualitative Concept] >> 827 C0018681:HEADACHE (Headache) [Sign or Symptom] >> <<<<< Mappings >> >> Phrase: and >> >>>>> Phrase >> <<<<< Phrase >> >> Phrase: my joints >> >>>>> Phrase >> joints >> <<<<< Phrase >> >>>>> Mappings >> Meta Mapping (1000): >> 1000 C0022417:Joints [Body Space or Junction] >> Meta Mapping (1000): >> 1000 C0392905:Joints (Articular system) [Body System] >> <<<<< Mappings >> >> Phrase: ache. >> >>>>> Phrase >> ache >> <<<<< Phrase >> >>>>> Mappings >> Meta Mapping (1000): >> 1000 C0234238:ACHE (Ache) [Sign or Symptom] >> <<<<< Mappings >> >> >> >> On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson jen.wilson...@gmail.com >> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >> >>> >>> >>> Hey Ted, >>> >>> I'm (embarrassingly) having some trouble navigating the NLM site. I >>> think I have an account and am trying to download some of the MetaMap >>> software (I think that the "Lite" version is sufficient). But when I >>> download the bz2 file, it won't open because I think I need to authenticate >>> it. Do you know how I'm supposed to access this software? Sorry if this is >>> out of your realm, I can try someone else at NLM. This has just been a lot >>> more difficult and confusing than I thought it should be! Thanks, >>> >>> On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen duluth...@gmail.com >>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>> >>>> >>>> >>>> Hi Jennifer, >>>> >>>> Mapping terms to CUIs is it's own problem, and there are a few nice >>>> tools already available that might be of some use. We've used MetaMap to >>>> good effect for this problem, so you might want to consider looking there. >>>> >>>> https://metamap.nlm.nih.gov/ >>>> >>>> I'd be curious if other users have recommendations as well.. >>>> >>>> Good luck, >>>> Ted >>>> >>>> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson jen.wilson...@gmail.com >>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>>> >>>>> >>>>> >>>>> Hi Ted, >>>>> >>>>> Thank you again for all of this. I'm sorry I had to put down this >>>>> project for a few days and am only now getting back to it. >>>>> >>>>> I see that ontologies change and reproducing that result might not be >>>>> the best sanity check on the scripts that I wrote. >>>>> >>>>> I'm going to try and figure out how to map to CUI terms and I'll be in >>>>> touch if I get stuck again. Thanks, >>>>> >>>>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen duluth...@gmail.com >>>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> This is perhaps a bit more than you were looking for, but there are >>>>>> quite a few command line tools available with UMLS::Similarity when you >>>>>> install locally that can be helpful for digging into situations like >>>>>> this. >>>>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I >>>>>> find that one of them does not have a path to the root, while the other >>>>>> does (see command output below) >>>>>> >>>>>> The lack of a path to the root is going to cause a lot of measures >>>>>> to report a -1 value (since path, for example, relies on finding this >>>>>> path >>>>>> as a part of its computation). In fact, not having a path to the root >>>>>> makes >>>>>> me question if C0156543 is in MSH at all, so it might even be that the >>>>>> CUI >>>>>> is no longer a part of MSH (and not just lacking a path to the root). >>>>>> But, >>>>>> regardless, clearly something has changed since 2009 that is causing this >>>>>> measure to return a different value. This happens in some cases since >>>>>> UMLS >>>>>> continues to evolve and CUIs are added, removed, etc. It's important to >>>>>> know what version of the UMLS a previous study has used if you are >>>>>> interested in getting a very exact comparison. In the case of our AMIA >>>>>> 2009 >>>>>> paper we used 2008AB, so things have no doubt changed a bit since then. >>>>>> >>>>>> tpederse@maraca:~$ findPathToRoot.pl C0156543 >>>>>> >>>>>> UMLS-Interface Configuration Information: >>>>>> (Default Information - no config file) >>>>>> >>>>>> Sources (SAB): >>>>>> MSH >>>>>> Relations (REL): >>>>>> PAR >>>>>> CHD >>>>>> >>>>>> Sources (SABDEF): >>>>>> UMLS_ALL >>>>>> Relations (RELDEF): >>>>>> UMLS_ALL >>>>>> >>>>>> >>>>>> There are no paths from the given C0156543 to the root. >>>>>> tpederse@maraca:~$ findPathToRoot.pl C0000786 >>>>>> >>>>>> >>>>>> UMLS-Interface Configuration Information: >>>>>> (Default Information - no config file) >>>>>> >>>>>> Sources (SAB): >>>>>> MSH >>>>>> Relations (REL): >>>>>> PAR >>>>>> CHD >>>>>> >>>>>> Sources (SABDEF): >>>>>> UMLS_ALL >>>>>> Relations (RELDEF): >>>>>> UMLS_ALL >>>>>> >>>>>> >>>>>> The paths between abortions, spontaneous (C0000786) and the root: >>>>>> => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh >>>>>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh >>>>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 >>>>>> (compl >>>>>> pregn) C0000786 (abortions, spontaneous) >>>>>> >>>>>> >>>>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <duluth...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Jennifer, >>>>>>> >>>>>>> Thanks for sharing this question. I think in general if you have a >>>>>>> choice between using CUIs or terms with UMLS::Similarity, your best >>>>>>> option >>>>>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity >>>>>>> might pick a CUI associated with a sense of the term you aren't >>>>>>> intending. >>>>>>> Also, if you misspell a term or don't specify it exactly correctly, >>>>>>> then it >>>>>>> shows up as not found. One useful resource for replicating similarity >>>>>>> measure studies (like the one you cite) is the following page which >>>>>>> includes term mappings for several of the datasets we've worked with >>>>>>> over >>>>>>> the years. >>>>>>> >>>>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html >>>>>>> >>>>>>> I will admit to being a little puzzled about the case of abortion - >>>>>>> miscarriage. The paper you cite clearly reports a value based on MSH, >>>>>>> but >>>>>>> as I try to run that query now I get a value of -1 (even when using the >>>>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but >>>>>>> that >>>>>>> somehow we are not able to compute some of the measures (a path length, >>>>>>> for >>>>>>> example). This suggests that there is not a path between the two CUIs, >>>>>>> which has something to do with the structure of UMLS/MSH. >>>>>>> >>>>>>> One quick and dirty way to see if a CUI is in MSH is to find the >>>>>>> path length between a CUI and itself. If it is present in MSH, that >>>>>>> value >>>>>>> will be 1. We see that for each of the CUIs used for abortion and >>>>>>> miscarriage. >>>>>>> >>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>> --measure path --sab MSH C0156543 C0156543 >>>>>>> Default Settings: >>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>> --rel PAR/CHD >>>>>>> User Settings: >>>>>>> --measure path >>>>>>> >>>>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion >>>>>>> NOS(C0156543) >>>>>>> >>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>> --measure path --sab MSH C0000786 C0000786 >>>>>>> Default Settings: >>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>> --rel PAR/CHD >>>>>>> User Settings: >>>>>>> --measure path >>>>>>> >>>>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786) >>>>>>> >>>>>>> However, when I try to find the path length between the two CUIs, I >>>>>>> get -1. This suggests that the CUIs are not jointed by PAR/CHD >>>>>>> relations...note that below you can see that the terms for the CUIs have >>>>>>> been looked up, which shows us that MSH knows about them... >>>>>>> >>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>> --measure path --sab MSH C0156543 C0000786 >>>>>>> Default Settings: >>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>> --rel PAR/CHD >>>>>>> User Settings: >>>>>>> --measure path >>>>>>> >>>>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont >>>>>>> aneous(C0000786) >>>>>>> >>>>>>> So, in any case, it would appear that something has changed in the >>>>>>> structure of MSH since we reported our results in the 2009 AMIA paper >>>>>>> you >>>>>>> mention. I'm not sure what that is. But, I think the general message is >>>>>>> that if you can use CUIs it will normally be more reliable to do that. >>>>>>> Mapping terms to CUIs is of course it's own problem, but >>>>>>> UMLS::Similarity >>>>>>> doesn't do anything terribly fancy with that, and so probably whatever >>>>>>> you >>>>>>> do will be more extensive and reliable than what UMLS::Similarity would >>>>>>> do... >>>>>>> >>>>>>> I hope this helps somehow, and please do feel free to follow up. >>>>>>> Thoughts from other users on this issue would also be most welcome! >>>>>>> >>>>>>> Cordially, >>>>>>> Ted >>>>>>> >>>>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson >>>>>>> jen.wilson...@gmail.com [umls-similarity] < >>>>>>> umls-similarity@yahoogroups.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I'm resending this now that I'm subscribed. Any advice would be >>>>>>>> much appreciated! Thank you, >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> From: Jennifer Wilson <jen.wilson...@gmail.com> >>>>>>>> Date: Tue, May 23, 2017 at 6:13 PM >>>>>>>> Subject: Help with the best approach for using the query-UMLS >>>>>>>> interface >>>>>>>> To: umls-similarity@yahoogroups.com >>>>>>>> >>>>>>>> >>>>>>>> Hello UMLS similarity team, >>>>>>>> >>>>>>>> I am trying to compute the similarity between ~30K >>>>>>>> disease/phenotype terms. Ideally, I would have a matrix of similarity >>>>>>>> for >>>>>>>> these terms. >>>>>>>> >>>>>>>> My first attempt was to write a python script to call the >>>>>>>> query-umls-similarity-webinterface.pl script. Though, before >>>>>>>> releasing the script on my dataset, I was trying to recreate the scores >>>>>>>> from this paper (https://www.ncbi.nlm.nih.gov/ >>>>>>>> pmc/articles/PMC2815481/) in table 1. >>>>>>>> >>>>>>>> Here's the command I am using: >>>>>>>> >>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>>> "Abortion" "Miscarriage" >>>>>>>> >>>>>>>> Default Settings: >>>>>>>> >>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>> >>>>>>>> --measure path >>>>>>>> >>>>>>>> >>>>>>>> User Settings: >>>>>>>> >>>>>>>> --rel PAR/CHD >>>>>>>> >>>>>>>> >>>>>>>> (-1.0, 'Abortion', 'Miscarriage') >>>>>>>> >>>>>>>> I also have not processed the text in my dataset much. I have >>>>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, >>>>>>>> and >>>>>>>> the GWAS catalogue. If I'm using data from all of these sources - do >>>>>>>> you >>>>>>>> recommend sending them directly to the query interface? Should I try >>>>>>>> and >>>>>>>> map to CUI terms? (examples below) >>>>>>>> >>>>>>>> Before I download the database and attempt to query the database >>>>>>>> (it's not a language that I use in my current work), I just wanted an >>>>>>>> outside perspective to see if there are best practices for using this >>>>>>>> data. >>>>>>>> Thank you in advance for your time! >>>>>>>> >>>>>>>> (examples) >>>>>>>> Here are two more examples showing the disease descriptions in my >>>>>>>> dataset. Is the UMLS interface robust to these various formats or do >>>>>>>> they >>>>>>>> need to be an exact match? >>>>>>>> >>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form" >>>>>>>> >>>>>>>> Default Settings: >>>>>>>> >>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>> >>>>>>>> --measure path >>>>>>>> >>>>>>>> >>>>>>>> User Settings: >>>>>>>> >>>>>>>> --rel PAR/CHD >>>>>>>> >>>>>>>> >>>>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local >>>>>>>> hypoplastic form') >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE" >>>>>>>> >>>>>>>> Default Settings: >>>>>>>> >>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>> >>>>>>>> --measure path >>>>>>>> >>>>>>>> >>>>>>>> User Settings: >>>>>>>> >>>>>>>> --rel PAR/CHD >>>>>>>> >>>>>>>> >>>>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED >>>>>>>> AGGRESSIVE') >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jennifer L. Wilson >>>>>>>> Bioengineering, Stanford University >>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jennifer L. Wilson >>>>>>>> Bioengineering, Stanford University >>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jennifer L. Wilson >>>>> Bioengineering, Stanford University >>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>> -- >>>>> Jennifer L. Wilson >>>>> Bioengineering, Stanford University >>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Jennifer L. Wilson >>> Bioengineering, Stanford University >>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>> >>> >> >> > > > > -- > Jennifer L. Wilson > Bioengineering, Stanford University > jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> > -- Jennifer L. Wilson Bioengineering, Stanford University jen.wilson...@gmail.com / 703.969.3318