Just an FYI: There are updated tools/scripts[1] that will format/load Sean’s new faster dictionary-lookup module. [1] http://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/
--Pei From: Masanz, James J. [mailto:[email protected]] Sent: Saturday, May 17, 2014 12:31 PM To: '[email protected]' Subject: RE: Help need to integrate UMLS2013AB dictionary The tokenizer that used a file of hyphenated words was replaced with a tokenizer that implements the Penn Treebank tokenization rules (TokenizerPTB.java) a while back. So as long as you used a recent copy of CreateLuceneIndexFromDelimitedFile which references TokenizerPTB instead of just Tokenizer, you can ignore the part about a hypenated.txt file. -- James From: Ramprasad Reddy [mailto:[email protected]] Sent: Friday, May 16, 2014 4:20 PM To: [email protected] Subject: Help need to integrate UMLS2013AB dictionary Hello, Good evening. I have been to trying add latest UMLS2013AB data to resources similar to UMLS2011AB. I tried to follow the instructions in the following locations: * https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423 * https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459 I already extracted the data from UMLS website and created the pipe delimited text as well. But looks like there is a change in the way tokenization(there is no hypenated.txt) is handled. I am facing a no main class error while running CreateLuceneIndexFromDelimitedFile.java, and also looking for help in creating steps to create 'umls.data' and 'umls.backup' files similar to umls2011ab in HSQLDB Sharing any resources or steps to do would be very helpful. Thank you, RP.
