Hi Ted, Thanks for your answer. On a related note - is there any way to query a local branch? Say that I have Diabetes Mellitus - can I find branches that are close to that disease without exhaustively searching all diseases?
Thanks again, On Wed, Aug 16, 2017 at 7:50 AM, Ted Pedersen duluth...@gmail.com [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > > > Hi Jen, > > A great question, but unfortunately we do not have any pre-computed files > of distances. This would be a good thing to have available but we just > haven't done that. I'm not aware of anyone else who has done that, but I'll > ask (via this email). If anyone has done that and is able to share, that > would be quite helpful I think. > > Good luck, > Ted > > On Tue, Aug 15, 2017 at 7:04 PM, Jennifer Wilson jen.wilson...@gmail.com > [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > >> >> >> Hi Ted, >> >> I'm reviving this old email thread since this work is becoming relevant >> to my project again. I realized I never asked - do you have any flat files >> of disease distances that are pre-calcuated? >> >> I'm looking to cluster down my list of MeSH termed-diseases (all pulled >> from DisGeNet) into groups of related diseases. For instance, I might want >> to clump 'Diabetes Mellitus' and 'Diabetes Mellitus, Non-Insulin Dependent' >> and have a separate group for things such as 'Depressive Symptoms' and >> 'Depressive Episodes'. Do you have an easy way to create these clusters? >> >> Thank you again for your help! >> >> On Mon, Jun 5, 2017 at 5:50 PM, Ted Pedersen duluth...@gmail.com >> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >> >>> >>> >>> When I am just trying to get a sense of a measure or test out something >>> we've added, I often tend to use FMA / Foundational Model of Anatomy as my >>> source. This is because it includes some fairly intuitive terms and is >>> structured in a hierarchical fashion, so similarity measures like path and >>> wup work fairly nicely. I tend to prefer wup over path since wup includes a >>> kind of correction for the depth of the concepts involved, but at this >>> point that might be a finer point. But, below are some examples of >>> intuitive results which I think make some sense at least, and might be a >>> good starting point for exploring. >>> >>> tpederse@maraca:~$ query-umls-similarity-webinterface.pl --sab FMA >>> --measure wup femur skull >>> Default Settings: >>> --default http://atlas.ahc.umn.edu/ >>> --rel PAR/CHD >>> User Settings: >>> --measure wup >>> >>> 0.8<>femur(C0015811)<>skull(C0037303) >>> >>> >>> tpederse@maraca:~$ query-umls-similarity-webinterface.pl --sab FMA >>> --measure wup femur bone >>> Default Settings: >>> --default http://atlas.ahc.umn.edu/ >>> --rel PAR/CHD >>> User Settings: >>> --measure wup >>> >>> 0.8333<>femur(C0015811)<>bone(C0262950) >>> >>> >>> tpederse@maraca:~$ query-umls-similarity-webinterface.pl --sab FMA >>> --measure wup skull bone >>> Default Settings: >>> --default http://atlas.ahc.umn.edu/ >>> --rel PAR/CHD >>> User Settings: >>> --measure wup >>> >>> 0.8696<>skull(C0037303)<>bone(C0262950) >>> >>> >>> tpederse@maraca:~$ query-umls-similarity-webinterface.pl --sab FMA >>> --measure wup finger hand >>> Default Settings: >>> --default http://atlas.ahc.umn.edu/ >>> --rel PAR/CHD >>> User Settings: >>> --measure wup >>> >>> 0.6923<>finger(C0016129)<>hand(C0018563) >>> >>> >>> tpederse@maraca:~$ query-umls-similarity-webinterface.pl --sab FMA >>> --measure wup toe foot >>> Default Settings: >>> --default http://atlas.ahc.umn.edu/ >>> --rel PAR/CHD >>> User Settings: >>> --measure wup >>> >>> 0.6923<>toe(C0040357)<>foot(C0016504) >>> >>> >>> On Mon, Jun 5, 2017 at 7:30 PM, Ted Pedersen <duluth...@gmail.com> >>> wrote: >>> >>>> Hi Jen, >>>> >>>> I looked at those particular CUIs and don't think they are in MSH or >>>> SNOMEDCT - that's why you are getting the -1 even though one would imagine >>>> there is some similarity between them. To find some other examples using >>>> Alzheimer's I used UTS Metathesaurus to look up CUIs in MSH that included >>>> the term Alzheimer's (and 9 were found in MSH). >>>> >>>> I took 2 of those and ran them with path and got -1, indicating no path >>>> found. However, when I used lesk or vector I found non-zero values. Lesk >>>> and vector are both based on comparing the definitions of two CUIs and do >>>> not rely on finding paths. >>>> >>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl C0002395 >>>> C0299337 --measure vector --sab MSH >>>> Default Settings: >>>> --default http://atlas.ahc.umn.edu/ >>>> --rel CUI/PAR/CHD/RB/RN >>>> User Settings: >>>> --measure vector >>>> >>>> 0.3131<>Disease, Alzheimer's(C0002395)<>familial Alzheimer's disease >>>> protein 1(C0299337) >>>> >>>> >>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl C0002395 >>>> C0299337 --measure lesk --sab MSH >>>> Default Settings: >>>> --default http://atlas.ahc.umn.edu/ >>>> --rel CUI/PAR/CHD/RB/RN >>>> User Settings: >>>> --measure lesk >>>> >>>> 19<>Disease, Alzheimer's(C0002395)<>familial Alzheimer's disease >>>> protein 1(C0299337) >>>> >>>> So, the tricky part is sometimes the coverage in different sources - >>>> two CUIs might be intuitively similar but simply not found in the source >>>> being used (or not path between them may exist) so will show a -1 value. >>>> >>>> I'm not sure this exactly answers your question, but I will think a >>>> little more and add what I can... >>>> >>>> More soon, >>>> Ted >>>> >>>> On Mon, Jun 5, 2017 at 5:41 PM, Jennifer Wilson jen.wilson...@gmail.com >>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>>> >>>>> >>>>> >>>>> Hey Ted, >>>>> >>>>> So I haven't quite figured out the MetaMap, but I have a set of >>>>> diseases that I mapped to CUIs another way. I'm still getting negative >>>>> results with diseases that I think should be "similar". For example: >>>>> >>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>> "C1864828" "C3810041" >>>>> >>>>> Default Settings: >>>>> >>>>> --default http://atlas.ahc.umn.edu/ >>>>> >>>>> --measure path >>>>> >>>>> >>>>> User Settings: >>>>> >>>>> --rel PAR/CHD >>>>> >>>>> >>>>> ["b'-1", 'ALZHEIMER DISEASE 10(C1864828)', "ALZHEIMER DISEASE >>>>> 18(C3810041)\\n'"] >>>>> >>>>> You can see my results on the last row. Could you advise- Would you >>>>> expect that these two CUIs would not be similar? I wanted to measure path >>>>> as a simple starting point, but could you recommend that another distance >>>>> might be more informative? Thanks again for your help! >>>>> >>>>> On Mon, Jun 5, 2017 at 1:43 PM, Jennifer Wilson < >>>>> jen.wilson...@gmail.com> wrote: >>>>> >>>>>> Hey Ted, >>>>>> >>>>>> Thanks for all of the help. I found the interactive interface really >>>>>> helpful and had been able to create inputs similar to what you shared. I >>>>>> have an open help ticket now on trying to get the file to download. He >>>>>> gave >>>>>> me some commands to try that I had already tried, so there must be >>>>>> something else to unzipping the code... >>>>>> >>>>>> Thanks again. Hopefully I'm close to a solution! >>>>>> >>>>>> On Mon, Jun 5, 2017 at 11:21 AM, Ted Pedersen duluth...@gmail.com >>>>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Jen, >>>>>>> >>>>>>> Nothing to be embarrassed about at all!. If you haven't already used >>>>>>> MetaMap interactively you might want to try that before you attempt a >>>>>>> local >>>>>>> install : >>>>>>> >>>>>>> https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml >>>>>>> >>>>>>> (You would need to be logged into UTS for the link to work I >>>>>>> think...) >>>>>>> >>>>>>> Anyway, once at that site on the right side there are some links for >>>>>>> using MetaMap interactively. Below is an example of what that looks like >>>>>>> (where the first line is my input and the rest is the output). I turned >>>>>>> on >>>>>>> the option to show CUIs, since I think that is your desire output... >>>>>>> >>>>>>> About the bz2 file, I think you'd need to uncompress that with >>>>>>> bunzip2, although I have not done a local install for a while so I am >>>>>>> not >>>>>>> 100 percent sure if that is the issue or not. But, I've cc'd the MetaMap >>>>>>> help line on this note, they are usually very good about following up on >>>>>>> issues like this. >>>>>>> >>>>>>> I hope this helps! >>>>>>> Ted >>>>>>> >>>>>>> Processing 00000000.tx.1: I have a really bad headache, and my joints >>>>>>> ache. >>>>>>> >>>>>>> Phrase: I >>>>>>> >>>>> Phrase >>>>>>> i >>>>>>> <<<<< Phrase >>>>>>> >>>>> Mappings >>>>>>> Meta Mapping (1000): >>>>>>> 1000 C0021966:I- (Iodides) [Inorganic Chemical] >>>>>>> Meta Mapping (1000): >>>>>>> 1000 C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, >>>>>>> or Protein,Immunologic Factor] >>>>>>> <<<<< Mappings >>>>>>> >>>>>>> Phrase: have >>>>>>> >>>>> Phrase >>>>>>> <<<<< Phrase >>>>>>> >>>>>>> Phrase: a really bad headache, >>>>>>> >>>>> Phrase >>>>>>> really bad headache >>>>>>> <<<<< Phrase >>>>>>> >>>>> Mappings >>>>>>> Meta Mapping (790): >>>>>>> 660 C0205169:Bad [Qualitative Concept] >>>>>>> 827 C0018681:HEADACHE (Headache) [Sign or Symptom] >>>>>>> <<<<< Mappings >>>>>>> >>>>>>> Phrase: and >>>>>>> >>>>> Phrase >>>>>>> <<<<< Phrase >>>>>>> >>>>>>> Phrase: my joints >>>>>>> >>>>> Phrase >>>>>>> joints >>>>>>> <<<<< Phrase >>>>>>> >>>>> Mappings >>>>>>> Meta Mapping (1000): >>>>>>> 1000 C0022417:Joints [Body Space or Junction] >>>>>>> Meta Mapping (1000): >>>>>>> 1000 C0392905:Joints (Articular system) [Body System] >>>>>>> <<<<< Mappings >>>>>>> >>>>>>> Phrase: ache. >>>>>>> >>>>> Phrase >>>>>>> ache >>>>>>> <<<<< Phrase >>>>>>> >>>>> Mappings >>>>>>> Meta Mapping (1000): >>>>>>> 1000 C0234238:ACHE (Ache) [Sign or Symptom] >>>>>>> <<<<< Mappings >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson >>>>>>> jen.wilson...@gmail.com [umls-similarity] < >>>>>>> umls-similarity@yahoogroups.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hey Ted, >>>>>>>> >>>>>>>> I'm (embarrassingly) having some trouble navigating the NLM site. I >>>>>>>> think I have an account and am trying to download some of the MetaMap >>>>>>>> software (I think that the "Lite" version is sufficient). But when I >>>>>>>> download the bz2 file, it won't open because I think I need to >>>>>>>> authenticate >>>>>>>> it. Do you know how I'm supposed to access this software? Sorry if >>>>>>>> this is >>>>>>>> out of your realm, I can try someone else at NLM. This has just been a >>>>>>>> lot >>>>>>>> more difficult and confusing than I thought it should be! Thanks, >>>>>>>> >>>>>>>> On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen duluth...@gmail.com >>>>>>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Jennifer, >>>>>>>>> >>>>>>>>> Mapping terms to CUIs is it's own problem, and there are a few >>>>>>>>> nice tools already available that might be of some use. We've used >>>>>>>>> MetaMap >>>>>>>>> to good effect for this problem, so you might want to consider >>>>>>>>> looking >>>>>>>>> there. >>>>>>>>> >>>>>>>>> https://metamap.nlm.nih.gov/ >>>>>>>>> >>>>>>>>> I'd be curious if other users have recommendations as well.. >>>>>>>>> >>>>>>>>> Good luck, >>>>>>>>> Ted >>>>>>>>> >>>>>>>>> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson >>>>>>>>> jen.wilson...@gmail.com [umls-similarity] < >>>>>>>>> umls-similarity@yahoogroups.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Ted, >>>>>>>>>> >>>>>>>>>> Thank you again for all of this. I'm sorry I had to put down this >>>>>>>>>> project for a few days and am only now getting back to it. >>>>>>>>>> >>>>>>>>>> I see that ontologies change and reproducing that result might >>>>>>>>>> not be the best sanity check on the scripts that I wrote. >>>>>>>>>> >>>>>>>>>> I'm going to try and figure out how to map to CUI terms and I'll >>>>>>>>>> be in touch if I get stuck again. Thanks, >>>>>>>>>> >>>>>>>>>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen >>>>>>>>>> duluth...@gmail.com [umls-similarity] < >>>>>>>>>> umls-similarity@yahoogroups.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is perhaps a bit more than you were looking for, but there >>>>>>>>>>> are quite a few command line tools available with UMLS::Similarity >>>>>>>>>>> when you >>>>>>>>>>> install locally that can be helpful for digging into situations >>>>>>>>>>> like this. >>>>>>>>>>> When I look for the path from each of these CUIs to the ROOT (of >>>>>>>>>>> MSH) I >>>>>>>>>>> find that one of them does not have a path to the root, while the >>>>>>>>>>> other >>>>>>>>>>> does (see command output below) >>>>>>>>>>> >>>>>>>>>>> The lack of a path to the root is going to cause a lot of >>>>>>>>>>> measures to report a -1 value (since path, for example, relies on >>>>>>>>>>> finding >>>>>>>>>>> this path as a part of its computation). In fact, not having a path >>>>>>>>>>> to the >>>>>>>>>>> root makes me question if C0156543 is in MSH at all, so it might >>>>>>>>>>> even be >>>>>>>>>>> that the CUI is no longer a part of MSH (and not just lacking a >>>>>>>>>>> path to the >>>>>>>>>>> root). But, regardless, clearly something has changed since 2009 >>>>>>>>>>> that is >>>>>>>>>>> causing this measure to return a different value. This happens in >>>>>>>>>>> some >>>>>>>>>>> cases since UMLS continues to evolve and CUIs are added, removed, >>>>>>>>>>> etc. It's >>>>>>>>>>> important to know what version of the UMLS a previous study has >>>>>>>>>>> used if you >>>>>>>>>>> are interested in getting a very exact comparison. In the case of >>>>>>>>>>> our AMIA >>>>>>>>>>> 2009 paper we used 2008AB, so things have no doubt changed a bit >>>>>>>>>>> since then. >>>>>>>>>>> >>>>>>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0156543 >>>>>>>>>>> >>>>>>>>>>> UMLS-Interface Configuration Information: >>>>>>>>>>> (Default Information - no config file) >>>>>>>>>>> >>>>>>>>>>> Sources (SAB): >>>>>>>>>>> MSH >>>>>>>>>>> Relations (REL): >>>>>>>>>>> PAR >>>>>>>>>>> CHD >>>>>>>>>>> >>>>>>>>>>> Sources (SABDEF): >>>>>>>>>>> UMLS_ALL >>>>>>>>>>> Relations (RELDEF): >>>>>>>>>>> UMLS_ALL >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There are no paths from the given C0156543 to the root. >>>>>>>>>>> tpederse@maraca:~$ findPathToRoot.pl C0000786 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> UMLS-Interface Configuration Information: >>>>>>>>>>> (Default Information - no config file) >>>>>>>>>>> >>>>>>>>>>> Sources (SAB): >>>>>>>>>>> MSH >>>>>>>>>>> Relations (REL): >>>>>>>>>>> PAR >>>>>>>>>>> CHD >>>>>>>>>>> >>>>>>>>>>> Sources (SABDEF): >>>>>>>>>>> UMLS_ALL >>>>>>>>>>> Relations (RELDEF): >>>>>>>>>>> UMLS_ALL >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The paths between abortions, spontaneous (C0000786) and the root: >>>>>>>>>>> => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 >>>>>>>>>>> (mesh descriptors) C1256741 (topical descriptor) C0012674 (diseases >>>>>>>>>>> (mesh >>>>>>>>>>> category)) C1720765 (female urogenital dis pregnancy compl) >>>>>>>>>>> C0032962 (compl >>>>>>>>>>> pregn) C0000786 (abortions, spontaneous) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen < >>>>>>>>>>> duluth...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Jennifer, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for sharing this question. I think in general if you >>>>>>>>>>>> have a choice between using CUIs or terms with UMLS::Similarity, >>>>>>>>>>>> your best >>>>>>>>>>>> option is to use the CUIs. Terms can map to multiple CUIs, and >>>>>>>>>>>> UMLS::Similarity might pick a CUI associated with a sense of the >>>>>>>>>>>> term you >>>>>>>>>>>> aren't intending. Also, if you misspell a term or don't specify it >>>>>>>>>>>> exactly >>>>>>>>>>>> correctly, then it shows up as not found. One useful resource for >>>>>>>>>>>> replicating similarity measure studies (like the one you cite) is >>>>>>>>>>>> the >>>>>>>>>>>> following page which includes term mappings for several of the >>>>>>>>>>>> datasets >>>>>>>>>>>> we've worked with over the years. >>>>>>>>>>>> >>>>>>>>>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html >>>>>>>>>>>> >>>>>>>>>>>> I will admit to being a little puzzled about the case of >>>>>>>>>>>> abortion - miscarriage. The paper you cite clearly reports a value >>>>>>>>>>>> based on >>>>>>>>>>>> MSH, but as I try to run that query now I get a value of -1 (even >>>>>>>>>>>> when >>>>>>>>>>>> using the CUIs). However, it appears that each of the CUIs is >>>>>>>>>>>> found in MSH, >>>>>>>>>>>> but that somehow we are not able to compute some of the measures >>>>>>>>>>>> (a path >>>>>>>>>>>> length, for example). This suggests that there is not a path >>>>>>>>>>>> between the >>>>>>>>>>>> two CUIs, which has something to do with the structure of UMLS/MSH. >>>>>>>>>>>> >>>>>>>>>>>> One quick and dirty way to see if a CUI is in MSH is to find >>>>>>>>>>>> the path length between a CUI and itself. If it is present in MSH, >>>>>>>>>>>> that >>>>>>>>>>>> value will be 1. We see that for each of the CUIs used for >>>>>>>>>>>> abortion and >>>>>>>>>>>> miscarriage. >>>>>>>>>>>> >>>>>>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>>>>>>> --measure path --sab MSH C0156543 C0156543 >>>>>>>>>>>> Default Settings: >>>>>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>>>>> --rel PAR/CHD >>>>>>>>>>>> User Settings: >>>>>>>>>>>> --measure path >>>>>>>>>>>> >>>>>>>>>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion >>>>>>>>>>>> NOS(C0156543) >>>>>>>>>>>> >>>>>>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>>>>>>> --measure path --sab MSH C0000786 C0000786 >>>>>>>>>>>> Default Settings: >>>>>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>>>>> --rel PAR/CHD >>>>>>>>>>>> User Settings: >>>>>>>>>>>> --measure path >>>>>>>>>>>> >>>>>>>>>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0 >>>>>>>>>>>> 000786) >>>>>>>>>>>> >>>>>>>>>>>> However, when I try to find the path length between the two >>>>>>>>>>>> CUIs, I get -1. This suggests that the CUIs are not jointed by >>>>>>>>>>>> PAR/CHD >>>>>>>>>>>> relations...note that below you can see that the terms for the >>>>>>>>>>>> CUIs have >>>>>>>>>>>> been looked up, which shows us that MSH knows about them... >>>>>>>>>>>> >>>>>>>>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>>>>>>>> --measure path --sab MSH C0156543 C0000786 >>>>>>>>>>>> Default Settings: >>>>>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>>>>> --rel PAR/CHD >>>>>>>>>>>> User Settings: >>>>>>>>>>>> --measure path >>>>>>>>>>>> >>>>>>>>>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont >>>>>>>>>>>> aneous(C0000786) >>>>>>>>>>>> >>>>>>>>>>>> So, in any case, it would appear that something has changed in >>>>>>>>>>>> the structure of MSH since we reported our results in the 2009 >>>>>>>>>>>> AMIA paper >>>>>>>>>>>> you mention. I'm not sure what that is. But, I think the general >>>>>>>>>>>> message is >>>>>>>>>>>> that if you can use CUIs it will normally be more reliable to do >>>>>>>>>>>> that. >>>>>>>>>>>> Mapping terms to CUIs is of course it's own problem, but >>>>>>>>>>>> UMLS::Similarity >>>>>>>>>>>> doesn't do anything terribly fancy with that, and so probably >>>>>>>>>>>> whatever you >>>>>>>>>>>> do will be more extensive and reliable than what UMLS::Similarity >>>>>>>>>>>> would >>>>>>>>>>>> do... >>>>>>>>>>>> >>>>>>>>>>>> I hope this helps somehow, and please do feel free to follow >>>>>>>>>>>> up. Thoughts from other users on this issue would also be most >>>>>>>>>>>> welcome! >>>>>>>>>>>> >>>>>>>>>>>> Cordially, >>>>>>>>>>>> Ted >>>>>>>>>>>> >>>>>>>>>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson >>>>>>>>>>>> jen.wilson...@gmail.com [umls-similarity] < >>>>>>>>>>>> umls-similarity@yahoogroups.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm resending this now that I'm subscribed. Any advice would >>>>>>>>>>>>> be much appreciated! Thank you, >>>>>>>>>>>>> >>>>>>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>>>>>> From: Jennifer Wilson <jen.wilson...@gmail.com> >>>>>>>>>>>>> Date: Tue, May 23, 2017 at 6:13 PM >>>>>>>>>>>>> Subject: Help with the best approach for using the query-UMLS >>>>>>>>>>>>> interface >>>>>>>>>>>>> To: umls-similarity@yahoogroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello UMLS similarity team, >>>>>>>>>>>>> >>>>>>>>>>>>> I am trying to compute the similarity between ~30K >>>>>>>>>>>>> disease/phenotype terms. Ideally, I would have a matrix of >>>>>>>>>>>>> similarity for >>>>>>>>>>>>> these terms. >>>>>>>>>>>>> >>>>>>>>>>>>> My first attempt was to write a python script to call the >>>>>>>>>>>>> query-umls-similarity-webinterface.pl script. Though, before >>>>>>>>>>>>> releasing the script on my dataset, I was trying to recreate the >>>>>>>>>>>>> scores >>>>>>>>>>>>> from this paper (https://www.ncbi.nlm.nih.gov/ >>>>>>>>>>>>> pmc/articles/PMC2815481/) in table 1. >>>>>>>>>>>>> >>>>>>>>>>>>> Here's the command I am using: >>>>>>>>>>>>> >>>>>>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel >>>>>>>>>>>>> PAR/CHD "Abortion" "Miscarriage" >>>>>>>>>>>>> >>>>>>>>>>>>> Default Settings: >>>>>>>>>>>>> >>>>>>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>>>>>> >>>>>>>>>>>>> --measure path >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> User Settings: >>>>>>>>>>>>> >>>>>>>>>>>>> --rel PAR/CHD >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> (-1.0, 'Abortion', 'Miscarriage') >>>>>>>>>>>>> >>>>>>>>>>>>> I also have not processed the text in my dataset much. I have >>>>>>>>>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, >>>>>>>>>>>>> PheWas, and >>>>>>>>>>>>> the GWAS catalogue. If I'm using data from all of these sources - >>>>>>>>>>>>> do you >>>>>>>>>>>>> recommend sending them directly to the query interface? Should I >>>>>>>>>>>>> try and >>>>>>>>>>>>> map to CUI terms? (examples below) >>>>>>>>>>>>> >>>>>>>>>>>>> Before I download the database and attempt to query the >>>>>>>>>>>>> database (it's not a language that I use in my current work), I >>>>>>>>>>>>> just wanted >>>>>>>>>>>>> an outside perspective to see if there are best practices for >>>>>>>>>>>>> using this >>>>>>>>>>>>> data. Thank you in advance for your time! >>>>>>>>>>>>> >>>>>>>>>>>>> (examples) >>>>>>>>>>>>> Here are two more examples showing the disease descriptions in >>>>>>>>>>>>> my dataset. Is the UMLS interface robust to these various formats >>>>>>>>>>>>> or do >>>>>>>>>>>>> they need to be an exact match? >>>>>>>>>>>>> >>>>>>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel >>>>>>>>>>>>> PAR/CHD "Testicular Neoplasms" "Amelogenesis imperfecta local >>>>>>>>>>>>> hypoplastic >>>>>>>>>>>>> form" >>>>>>>>>>>>> >>>>>>>>>>>>> Default Settings: >>>>>>>>>>>>> >>>>>>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>>>>>> >>>>>>>>>>>>> --measure path >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> User Settings: >>>>>>>>>>>>> >>>>>>>>>>>>> --rel PAR/CHD >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local >>>>>>>>>>>>> hypoplastic form') >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel >>>>>>>>>>>>> PAR/CHD "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED >>>>>>>>>>>>> AGGRESSIVE" >>>>>>>>>>>>> >>>>>>>>>>>>> Default Settings: >>>>>>>>>>>>> >>>>>>>>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>>>>>>>> >>>>>>>>>>>>> --measure path >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> User Settings: >>>>>>>>>>>>> >>>>>>>>>>>>> --rel PAR/CHD >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, >>>>>>>>>>>>> LOCALIZED AGGRESSIVE') >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jennifer L. Wilson >>>>>>>>>>>>> Bioengineering, Stanford University >>>>>>>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jennifer L. Wilson >>>>>>>>>>>>> Bioengineering, Stanford University >>>>>>>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jennifer L. Wilson >>>>>>>>>> Bioengineering, Stanford University >>>>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>>>> -- >>>>>>>>>> Jennifer L. Wilson >>>>>>>>>> Bioengineering, Stanford University >>>>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jennifer L. Wilson >>>>>>>> Bioengineering, Stanford University >>>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jennifer L. Wilson >>>>>> Bioengineering, Stanford University >>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jennifer L. Wilson >>>>> Bioengineering, Stanford University >>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>> >>>>> >>>> >>> >> >> >> -- >> Jennifer L. Wilson >> Bioengineering, Stanford University >> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >> >> > > -- Jennifer L. Wilson Bioengineering, Stanford University jen.wilson...@gmail.com / 703.969.3318