Hi Jessica – Many thanks for the insight. I see where this is going wrong. Yes, DOC and ADR are present in the text. However, DOC is mentioned as “.doc” which is the representation of a file extension and not a drug from the text perspective. Also, ADR is mentioned in the document as an abbreviation to “Adverse Drug Reaction”.
I think the only way to exclude such words would be to pre-process the text before passing to cTAKES. Is there any other way within cTAKES to achieve this? (Ex: pass a file with stop words and add some of these abbreviations in that)? Thanks Sekhar H. From: Jessica Glover <glover.jessic...@gmail.com> Sent: Wednesday, June 5, 2019 1:07 AM To: user@ctakes.apache.org Subject: Re: cTAKES output Hi Sekhar, Do you use the CAS Visual Debugger (CVD), or even a text editor that will show you the character positions of the document text? I can see from your output that the evidence spans for each RxNorm code are annotated. Code Evidence span offsets 10311 793-797 3256 1152-1155 450530 576-584 217992 452-460 3639 2454-2457 Look in those places in your document to find out what language is triggering these codes. Other things to note: A common abbreviation of Deoxycorticosterone is "DOC". I bolded where I see DOC and 3256 in your output. Similarly, "ADR" is another way to express Doxorubicin, and I've bolded that in your output as well. See below. {'ANATOMICALSITEMENTION': {'ORAL': ['START: 793', 'END: 797', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 74262004, CUI: C0226896, TUI: T030]']}, 'MEDICATIONMENTION': {'TABLETS': ['START: 474', 'END: 481', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 46992007, CUI: C0039225, TUI: T122]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 385055001, CUI: C0039225, TUI: T122]', '[CODINGSCHEME: RXNORM, CODE: 10311, CUI: C0039225, TUI: T122]'], 'INJECTION': ['START: 817', 'END: 826', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 129326001, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 28289002, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 59108006, CUI: C1533685, TUI: T061]'], 'DOC': ['START: 1152', 'END: 1155', 'POLARITY: 1', '[CODINGSCHEME: RXNORM, CODE: 3256, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: RXNORM, CODE: 3256, CUI: C0011710, TUI: T121]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 56156001, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 1336006, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 1336006, CUI: C0011710, TUI: T121]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 75029008, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: RXNORM, CODE: 3256, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 56156001, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 56156001, CUI: C0011710, TUI: T121]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 1336006, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 75029008, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 75029008, CUI: C0011710, TUI: T121]'], 'SOLUTION': ['START: 576', 'END: 584', 'POLARITY: 1', '[CODINGSCHEME: RXNORM, CODE: 450530, CUI: C1382100, TUI: T122]'], 'ORAL': ['START: 793', 'END: 797', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 74262004, CUI: C0226896, TUI: T030]'], 'LEVAQUIN': ['START: 452', 'END: 460', 'POLARITY: 1', '[CODINGSCHEME: RXNORM, CODE: 217992, CUI: C0721336, TUI: T121]', '[CODINGSCHEME: RXNORM, CODE: 217992, CUI: C0721336, TUI: T109]'], 'ADR': ['START: 2454', 'END: 2457', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 68444001, CUI: C0013089, TUI: T195]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 68444001, CUI: C0013089, TUI: T109]', '[CODINGSCHEME: RXNORM, CODE: 3639, CUI: C0013089, TUI: T195]', '[CODINGSCHEME: RXNORM, CODE: 3639, CUI: C0013089, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 372817009, CUI: C0013089, TUI: T195]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 372817009, CUI: C0013089, TUI: T109]']}, 'DRUGCHANGESTATUSANNOTATION': {}, 'STRENGTHANNOTATION': {}, 'FRACTIONSTRENGTHANNOTATION': {}, 'FREQUENCYUNITANNOTATION': {}, 'DISEASEDISORDERMENTION': {'RUPTURE': ['START: 1579', 'END: 1586', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 125671007, CUI: C3203359, TUI: T037]']}, 'SIGNSYMPTOMMENTION': {'RED': ['START: 1081', 'END: 1084', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 386713009, CUI: C0332575, TUI: T033]'], 'CONTENT': ['START: 2992', 'END: 2999', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 271599002, CUI: C0423896, TUI: T041]'], 'HISTORY': ['START: 34', 'END: 41', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 392521001, CUI: C0262926, TUI: T033]']}, 'ROUTEANNOTATION': {}, 'DATEANNOTATION': {}, 'MEASUREMENTANNOTATION': {}, 'PROCEDUREMENTION': {'INJECTION': ['START: 817', 'END: 826', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 129326001, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 28289002, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 59108006, CUI: C1533685, TUI: T061]']}, 'TIMEMENTION': {}, 'STRENGTHUNITANNOTATION': {}} Hope this helps, Jessica On Tue, Jun 4, 2019 at 1:40 PM gandhi rajan <gandhiraja...@gmail.com<mailto:gandhiraja...@gmail.com>> wrote: Hi Sekhar, To answer your first question, As per my knowledge, I don't think there are any config change to filter output. You gotta pick and choose the desired output as per your requirement by parsing the output XML. On Tuesday, June 4, 2019, Hari, Sekhar <sekhar.h...@cgi.com<mailto:sekhar.h...@cgi.com>> wrote: Hi All – I see something that is not correct in the cTAKES output for the text below. I sincerely hope somebody can guide me here with my questions at the end. Not sure if I’m doing anything wrong with the cTAKES configuration. Content: “Since the last approved labeling, there has been no submission to LEVAQUIN® NDAs: NDA 20-634 LEVAQUIN® (levofloxacin) Tablets, NDA 20-635 LEVAQUIN® (levofloxacin) Injection, NDA 21-721 LEVAQUIN® (levofloxacin) Oral Solution.” There are several lines after this. But the only brand name of the drug that is mentioned in the whole document is ‘LEVAQUIN’ and generic name mentioned is ‘levofloxacin’. These names appear at a couple of places in the document, and then there are some disease names mentioned too. Objective: Retrieve the generic name and brand name from the text using the cTAKES returned RXNORM codes. We do a POST of the full text to the API - http://XX.XX.XX.XX/ctakes-web-rest/service/analyze<https://urldefense.proofpoint.com/v2/url?u=http-3A__XX.XX.XX.XX_ctakes-2Dweb-2Drest_service_analyze&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=GAipXiP0G0TsVpz6BpNhH1DSC_wewj2cdVIV-HVMiag&m=dOPc9E0_D-Pjz4yhOFxZhI7Qtok4PYvBQ9-6Xpd-w44&s=AJZRIsV1fJNXvx9LPRTm8NBxgaPFZAHaxc_zB7Jupkw&e=>. …following is the output from API: {'ANATOMICALSITEMENTION': {'ORAL': ['START: 793', 'END: 797', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 74262004, CUI: C0226896, TUI: T030]']}, 'MEDICATIONMENTION': {'TABLETS': ['START: 474', 'END: 481', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 46992007, CUI: C0039225, TUI: T122]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 385055001, CUI: C0039225, TUI: T122]', '[CODINGSCHEME: RXNORM, CODE: 10311, CUI: C0039225, TUI: T122]'], 'INJECTION': ['START: 817', 'END: 826', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 129326001, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 28289002, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 59108006, CUI: C1533685, TUI: T061]'], 'DOC': ['START: 1152', 'END: 1155', 'POLARITY: 1', '[CODINGSCHEME: RXNORM, CODE: 3256, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: RXNORM, CODE: 3256, CUI: C0011710, TUI: T121]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 56156001, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 1336006, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 1336006, CUI: C0011710, TUI: T121]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 75029008, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: RXNORM, CODE: 3256, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 56156001, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 56156001, CUI: C0011710, TUI: T121]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 1336006, CUI: C0011710, TUI: T125]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 75029008, CUI: C0011710, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 75029008, CUI: C0011710, TUI: T121]'], 'SOLUTION': ['START: 576', 'END: 584', 'POLARITY: 1', '[CODINGSCHEME: RXNORM, CODE: 450530, CUI: C1382100, TUI: T122]'], 'ORAL': ['START: 793', 'END: 797', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 74262004, CUI: C0226896, TUI: T030]'], 'LEVAQUIN': ['START: 452', 'END: 460', 'POLARITY: 1', '[CODINGSCHEME: RXNORM, CODE: 217992, CUI: C0721336, TUI: T121]', '[CODINGSCHEME: RXNORM, CODE: 217992, CUI: C0721336, TUI: T109]'], 'ADR': ['START: 2454', 'END: 2457', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 68444001, CUI: C0013089, TUI: T195]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 68444001, CUI: C0013089, TUI: T109]', '[CODINGSCHEME: RXNORM, CODE: 3639, CUI: C0013089, TUI: T195]', '[CODINGSCHEME: RXNORM, CODE: 3639, CUI: C0013089, TUI: T109]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 372817009, CUI: C0013089, TUI: T195]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 372817009, CUI: C0013089, TUI: T109]']}, 'DRUGCHANGESTATUSANNOTATION': {}, 'STRENGTHANNOTATION': {}, 'FRACTIONSTRENGTHANNOTATION': {}, 'FREQUENCYUNITANNOTATION': {}, 'DISEASEDISORDERMENTION': {'RUPTURE': ['START: 1579', 'END: 1586', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 125671007, CUI: C3203359, TUI: T037]']}, 'SIGNSYMPTOMMENTION': {'RED': ['START: 1081', 'END: 1084', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 386713009, CUI: C0332575, TUI: T033]'], 'CONTENT': ['START: 2992', 'END: 2999', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 271599002, CUI: C0423896, TUI: T041]'], 'HISTORY': ['START: 34', 'END: 41', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 392521001, CUI: C0262926, TUI: T033]']}, 'ROUTEANNOTATION': {}, 'DATEANNOTATION': {}, 'MEASUREMENTANNOTATION': {}, 'PROCEDUREMENTION': {'INJECTION': ['START: 817', 'END: 826', 'POLARITY: 1', '[CODINGSCHEME: SNOMEDCT_US, CODE: 129326001, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 28289002, CUI: C1533685, TUI: T061]', '[CODINGSCHEME: SNOMEDCT_US, CODE: 59108006, CUI: C1533685, TUI: T061]']}, 'TIMEMENTION': {}, 'STRENGTHUNITANNOTATION': {}} Questions: 1. How do we restrict the output to show only RXNORM coding scheme? Please describe with any config change example, if possible. 2. These are the unique RXNORM codes from the above output: '10311', '3256', '450530', '217992', '3639'. These codes map to the drug names: ‘DESOXYCORTICOSTERONE', 'LEVAQUIN', 'DOXORUBICIN’ a. The text do not mention anything about ‘DESOXYCORTICOSTERONE' and 'DOXORUBICIN’. How is cTAKES reporting that? b. The text has ‘levofloxacin’, and an RXNORM code is not returned for this name. Any idea? 3. How do we enable cTAKES so that it returns only those codes that are available in RxTerms dictionary? None of the RXNORM codes reported above are available in RxTerms. Thanks Sekhar H. -- Regards, Gandhi "The best way to find urself is to lose urself in the service of others !!!"