Hey Ted, Thanks for all of the help. I found the interactive interface really helpful and had been able to create inputs similar to what you shared. I have an open help ticket now on trying to get the file to download. He gave me some commands to try that I had already tried, so there must be something else to unzipping the code...
Thanks again. Hopefully I'm close to a solution! On Mon, Jun 5, 2017 at 11:21 AM, Ted Pedersen duluth...@gmail.com [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > > > Hi Jen, > > Nothing to be embarrassed about at all!. If you haven't already used > MetaMap interactively you might want to try that before you attempt a local > install : > > https://ii.nlm.nih.gov/Interactive/UTS_Required/metamap.shtml > > (You would need to be logged into UTS for the link to work I think...) > > Anyway, once at that site on the right side there are some links for using > MetaMap interactively. Below is an example of what that looks like (where > the first line is my input and the rest is the output). I turned on the > option to show CUIs, since I think that is your desire output... > > About the bz2 file, I think you'd need to uncompress that with bunzip2, > although I have not done a local install for a while so I am not 100 > percent sure if that is the issue or not. But, I've cc'd the MetaMap help > line on this note, they are usually very good about following up on issues > like this. > > I hope this helps! > Ted > > Processing 00000000.tx.1: I have a really bad headache, and my joints ache. > > Phrase: I > >>>>> Phrase > i > <<<<< Phrase > >>>>> Mappings > Meta Mapping (1000): > 1000 C0021966:I- (Iodides) [Inorganic Chemical] > Meta Mapping (1000): > 1000 C0221138:I NOS (Blood group antibody I) [Amino Acid, Peptide, or > Protein,Immunologic Factor] > <<<<< Mappings > > Phrase: have > >>>>> Phrase > <<<<< Phrase > > Phrase: a really bad headache, > >>>>> Phrase > really bad headache > <<<<< Phrase > >>>>> Mappings > Meta Mapping (790): > 660 C0205169:Bad [Qualitative Concept] > 827 C0018681:HEADACHE (Headache) [Sign or Symptom] > <<<<< Mappings > > Phrase: and > >>>>> Phrase > <<<<< Phrase > > Phrase: my joints > >>>>> Phrase > joints > <<<<< Phrase > >>>>> Mappings > Meta Mapping (1000): > 1000 C0022417:Joints [Body Space or Junction] > Meta Mapping (1000): > 1000 C0392905:Joints (Articular system) [Body System] > <<<<< Mappings > > Phrase: ache. > >>>>> Phrase > ache > <<<<< Phrase > >>>>> Mappings > Meta Mapping (1000): > 1000 C0234238:ACHE (Ache) [Sign or Symptom] > <<<<< Mappings > > > > On Mon, Jun 5, 2017 at 12:25 PM, Jennifer Wilson jen.wilson...@gmail.com > [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > >> >> >> Hey Ted, >> >> I'm (embarrassingly) having some trouble navigating the NLM site. I think >> I have an account and am trying to download some of the MetaMap software (I >> think that the "Lite" version is sufficient). But when I download the bz2 >> file, it won't open because I think I need to authenticate it. Do you know >> how I'm supposed to access this software? Sorry if this is out of your >> realm, I can try someone else at NLM. This has just been a lot more >> difficult and confusing than I thought it should be! Thanks, >> >> On Fri, Jun 2, 2017 at 7:07 PM, Ted Pedersen duluth...@gmail.com >> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >> >>> >>> >>> Hi Jennifer, >>> >>> Mapping terms to CUIs is it's own problem, and there are a few nice >>> tools already available that might be of some use. We've used MetaMap to >>> good effect for this problem, so you might want to consider looking there. >>> >>> https://metamap.nlm.nih.gov/ >>> >>> I'd be curious if other users have recommendations as well.. >>> >>> Good luck, >>> Ted >>> >>> On Fri, Jun 2, 2017 at 7:56 PM, Jennifer Wilson jen.wilson...@gmail.com >>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>> >>>> >>>> >>>> Hi Ted, >>>> >>>> Thank you again for all of this. I'm sorry I had to put down this >>>> project for a few days and am only now getting back to it. >>>> >>>> I see that ontologies change and reproducing that result might not be >>>> the best sanity check on the scripts that I wrote. >>>> >>>> I'm going to try and figure out how to map to CUI terms and I'll be in >>>> touch if I get stuck again. Thanks, >>>> >>>> On Sun, May 28, 2017 at 10:59 AM, Ted Pedersen duluth...@gmail.com >>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote: >>>> >>>>> >>>>> >>>>> This is perhaps a bit more than you were looking for, but there are >>>>> quite a few command line tools available with UMLS::Similarity when you >>>>> install locally that can be helpful for digging into situations like this. >>>>> When I look for the path from each of these CUIs to the ROOT (of MSH) I >>>>> find that one of them does not have a path to the root, while the other >>>>> does (see command output below) >>>>> >>>>> The lack of a path to the root is going to cause a lot of measures to >>>>> report a -1 value (since path, for example, relies on finding this path as >>>>> a part of its computation). In fact, not having a path to the root makes >>>>> me >>>>> question if C0156543 is in MSH at all, so it might even be that the CUI is >>>>> no longer a part of MSH (and not just lacking a path to the root). But, >>>>> regardless, clearly something has changed since 2009 that is causing this >>>>> measure to return a different value. This happens in some cases since UMLS >>>>> continues to evolve and CUIs are added, removed, etc. It's important to >>>>> know what version of the UMLS a previous study has used if you are >>>>> interested in getting a very exact comparison. In the case of our AMIA >>>>> 2009 >>>>> paper we used 2008AB, so things have no doubt changed a bit since then. >>>>> >>>>> tpederse@maraca:~$ findPathToRoot.pl C0156543 >>>>> >>>>> UMLS-Interface Configuration Information: >>>>> (Default Information - no config file) >>>>> >>>>> Sources (SAB): >>>>> MSH >>>>> Relations (REL): >>>>> PAR >>>>> CHD >>>>> >>>>> Sources (SABDEF): >>>>> UMLS_ALL >>>>> Relations (RELDEF): >>>>> UMLS_ALL >>>>> >>>>> >>>>> There are no paths from the given C0156543 to the root. >>>>> tpederse@maraca:~$ findPathToRoot.pl C0000786 >>>>> >>>>> >>>>> UMLS-Interface Configuration Information: >>>>> (Default Information - no config file) >>>>> >>>>> Sources (SAB): >>>>> MSH >>>>> Relations (REL): >>>>> PAR >>>>> CHD >>>>> >>>>> Sources (SABDEF): >>>>> UMLS_ALL >>>>> Relations (RELDEF): >>>>> UMLS_ALL >>>>> >>>>> >>>>> The paths between abortions, spontaneous (C0000786) and the root: >>>>> => C0000000 (**UMLS ROOT**) C1135584 (mesh headings) C1256739 (mesh >>>>> descriptors) C1256741 (topical descriptor) C0012674 (diseases (mesh >>>>> category)) C1720765 (female urogenital dis pregnancy compl) C0032962 >>>>> (compl >>>>> pregn) C0000786 (abortions, spontaneous) >>>>> >>>>> >>>>> On Sun, May 28, 2017 at 12:43 PM, Ted Pedersen <duluth...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Jennifer, >>>>>> >>>>>> Thanks for sharing this question. I think in general if you have a >>>>>> choice between using CUIs or terms with UMLS::Similarity, your best >>>>>> option >>>>>> is to use the CUIs. Terms can map to multiple CUIs, and UMLS::Similarity >>>>>> might pick a CUI associated with a sense of the term you aren't >>>>>> intending. >>>>>> Also, if you misspell a term or don't specify it exactly correctly, then >>>>>> it >>>>>> shows up as not found. One useful resource for replicating similarity >>>>>> measure studies (like the one you cite) is the following page which >>>>>> includes term mappings for several of the datasets we've worked with over >>>>>> the years. >>>>>> >>>>>> http://www-users.cs.umn.edu/~bthomson/corpus/corpus.html >>>>>> >>>>>> I will admit to being a little puzzled about the case of abortion - >>>>>> miscarriage. The paper you cite clearly reports a value based on MSH, but >>>>>> as I try to run that query now I get a value of -1 (even when using the >>>>>> CUIs). However, it appears that each of the CUIs is found in MSH, but >>>>>> that >>>>>> somehow we are not able to compute some of the measures (a path length, >>>>>> for >>>>>> example). This suggests that there is not a path between the two CUIs, >>>>>> which has something to do with the structure of UMLS/MSH. >>>>>> >>>>>> One quick and dirty way to see if a CUI is in MSH is to find the path >>>>>> length between a CUI and itself. If it is present in MSH, that value will >>>>>> be 1. We see that for each of the CUIs used for abortion and miscarriage. >>>>>> >>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>> --measure path --sab MSH C0156543 C0156543 >>>>>> Default Settings: >>>>>> --default http://atlas.ahc.umn.edu/ >>>>>> --rel PAR/CHD >>>>>> User Settings: >>>>>> --measure path >>>>>> >>>>>> 1<>Unspecified abortion NOS(C0156543)<>Unspecified abortion >>>>>> NOS(C0156543) >>>>>> >>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>> --measure path --sab MSH C0000786 C0000786 >>>>>> Default Settings: >>>>>> --default http://atlas.ahc.umn.edu/ >>>>>> --rel PAR/CHD >>>>>> User Settings: >>>>>> --measure path >>>>>> >>>>>> 1<>Abortions.spontaneous(C0000786)<>Abortions.spontaneous(C0000786) >>>>>> >>>>>> However, when I try to find the path length between the two CUIs, I >>>>>> get -1. This suggests that the CUIs are not jointed by PAR/CHD >>>>>> relations...note that below you can see that the terms for the CUIs have >>>>>> been looked up, which shows us that MSH knows about them... >>>>>> >>>>>> tpederse@maraca:~$ perl query-umls-similarity-webinterface.pl >>>>>> --measure path --sab MSH C0156543 C0000786 >>>>>> Default Settings: >>>>>> --default http://atlas.ahc.umn.edu/ >>>>>> --rel PAR/CHD >>>>>> User Settings: >>>>>> --measure path >>>>>> >>>>>> -1<>Unspecified abortion NOS(C0156543)<>Abortions.spont >>>>>> aneous(C0000786) >>>>>> >>>>>> So, in any case, it would appear that something has changed in the >>>>>> structure of MSH since we reported our results in the 2009 AMIA paper you >>>>>> mention. I'm not sure what that is. But, I think the general message is >>>>>> that if you can use CUIs it will normally be more reliable to do that. >>>>>> Mapping terms to CUIs is of course it's own problem, but UMLS::Similarity >>>>>> doesn't do anything terribly fancy with that, and so probably whatever >>>>>> you >>>>>> do will be more extensive and reliable than what UMLS::Similarity would >>>>>> do... >>>>>> >>>>>> I hope this helps somehow, and please do feel free to follow up. >>>>>> Thoughts from other users on this issue would also be most welcome! >>>>>> >>>>>> Cordially, >>>>>> Ted >>>>>> >>>>>> On Sat, May 27, 2017 at 12:18 PM, Jennifer Wilson >>>>>> jen.wilson...@gmail.com [umls-similarity] < >>>>>> umls-similarity@yahoogroups.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'm resending this now that I'm subscribed. Any advice would be much >>>>>>> appreciated! Thank you, >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> From: Jennifer Wilson <jen.wilson...@gmail.com> >>>>>>> Date: Tue, May 23, 2017 at 6:13 PM >>>>>>> Subject: Help with the best approach for using the query-UMLS >>>>>>> interface >>>>>>> To: umls-similarity@yahoogroups.com >>>>>>> >>>>>>> >>>>>>> Hello UMLS similarity team, >>>>>>> >>>>>>> I am trying to compute the similarity between ~30K disease/phenotype >>>>>>> terms. Ideally, I would have a matrix of similarity for these terms. >>>>>>> >>>>>>> My first attempt was to write a python script to call the >>>>>>> query-umls-similarity-webinterface.pl script. Though, before >>>>>>> releasing the script on my dataset, I was trying to recreate the scores >>>>>>> from this paper (https://www.ncbi.nlm.nih.gov/ >>>>>>> pmc/articles/PMC2815481/) in table 1. >>>>>>> >>>>>>> Here's the command I am using: >>>>>>> >>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>> "Abortion" "Miscarriage" >>>>>>> >>>>>>> Default Settings: >>>>>>> >>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>> >>>>>>> --measure path >>>>>>> >>>>>>> >>>>>>> User Settings: >>>>>>> >>>>>>> --rel PAR/CHD >>>>>>> >>>>>>> >>>>>>> (-1.0, 'Abortion', 'Miscarriage') >>>>>>> >>>>>>> I also have not processed the text in my dataset much. I have >>>>>>> basically pulled diseases and phenotypes from DisGeNet, OMIN, PheWas, >>>>>>> and >>>>>>> the GWAS catalogue. If I'm using data from all of these sources - do you >>>>>>> recommend sending them directly to the query interface? Should I try and >>>>>>> map to CUI terms? (examples below) >>>>>>> >>>>>>> Before I download the database and attempt to query the database >>>>>>> (it's not a language that I use in my current work), I just wanted an >>>>>>> outside perspective to see if there are best practices for using this >>>>>>> data. >>>>>>> Thank you in advance for your time! >>>>>>> >>>>>>> (examples) >>>>>>> Here are two more examples showing the disease descriptions in my >>>>>>> dataset. Is the UMLS interface robust to these various formats or do >>>>>>> they >>>>>>> need to be an exact match? >>>>>>> >>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>> "Testicular Neoplasms" "Amelogenesis imperfecta local hypoplastic form" >>>>>>> >>>>>>> Default Settings: >>>>>>> >>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>> >>>>>>> --measure path >>>>>>> >>>>>>> >>>>>>> User Settings: >>>>>>> >>>>>>> --rel PAR/CHD >>>>>>> >>>>>>> >>>>>>> (-1.0, 'Testicular Neoplasms', 'Amelogenesis imperfecta local >>>>>>> hypoplastic form') >>>>>>> >>>>>>> >>>>>>> >>>>>>> ./query-umls-similarity-webinterface.pl --sab MSH --rel PAR/CHD >>>>>>> "Hypotrichosis 2, 146520 (3)" "PERIODONTITIS, LOCALIZED AGGRESSIVE" >>>>>>> >>>>>>> Default Settings: >>>>>>> >>>>>>> --default http://atlas.ahc.umn.edu/ >>>>>>> >>>>>>> --measure path >>>>>>> >>>>>>> >>>>>>> User Settings: >>>>>>> >>>>>>> --rel PAR/CHD >>>>>>> >>>>>>> >>>>>>> (-1.0, 'Hypotrichosis 2, 146520 (3)', 'PERIODONTITIS, LOCALIZED >>>>>>> AGGRESSIVE') >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jennifer L. Wilson >>>>>>> Bioengineering, Stanford University >>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jennifer L. Wilson >>>>>>> Bioengineering, Stanford University >>>>>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Jennifer L. Wilson >>>> Bioengineering, Stanford University >>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>> -- >>>> Jennifer L. Wilson >>>> Bioengineering, Stanford University >>>> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >>>> >>>> >>> >> >> >> -- >> Jennifer L. Wilson >> Bioengineering, Stanford University >> jen.wilson...@gmail.com / 703.969.3318 <(703)%20969-3318> >> >> > > -- Jennifer L. Wilson Bioengineering, Stanford University jen.wilson...@gmail.com / 703.969.3318