thanks for these replies, very helpful! On Sat, Jan 14, 2017 at 6:19 AM, Savova, Guergana < [email protected]> wrote:
> You can use the MITRE MIST tool for the deidentification. It allows > re-training, etc. You have to run it as a pre-processor independent of > cTAKES, then use its output as the input to cTAKES. > > http://mist-deid.sourceforge.net/ > > > > Compete de-identification is an unsolved problem though, there are no > guarantees there would be no leaks. > > > > I hope this helps. > > --Guergana Savova, PhD, FACMI > Associate Professor > PI Natural Language Processing Lab > Boston Children's Hospital and Harvard Medical School > 300 Longwood Avenue > Mailstop: BCH3092 > Enders 144.1 > Boston, MA 02115 > Tel: (617) 919-2972 > Fax: (617) 730-0817 > [email protected] > Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv > > ctakes.apache.org > thyme.healthnlp.org > cancer.healthnlp.org > share.healthnlp.org > > *From:* Dipankar Ray [mailto:[email protected]] > *Sent:* Friday, January 13, 2017 6:01 PM > *To:* [email protected] > *Subject:* de-identification > > > > Hi folks, > > > > Apologies if this is a newbie question - tried to look for an earlier > occurrence of it, but was unsuccessful. > > > > From this website (https://open.med.harvard.edu/project/scrubber/) I > learned that the Scrubber de-identification tool is now available as part > of CTAKES. But I didn't see anything about de-identification listed among > the components here: > > > > https://cwiki.apache.org/confluence/display/CTAKES/ > cTAKES+3.2+Component+Use+Guide > <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2BComponent-2BUse-2BGuide&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=BPD8OBFn5bnp0ZZrPiqD5jss63CaCnPz943cABqbAi4&s=5vXOFR62vx5O31vm16WYuFde-0OzHIogPqEqhO4gcmY&e=> > > > > *Question: *How do I use CTAKES for de-identification? > > > > best, > > Dipankar >
