Thank you Chen and James for your inputs. Will look into this updated tool and get back to you for any questions I may encounter.
On Mon, May 19, 2014 at 10:12 AM, Chen, Pei <[email protected]>wrote: > Just an FYI: > > There are updated tools/scripts[1] that will format/load Sean’s new faster > dictionary-lookup module. > > [1] http://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/ > > > > --Pei > > > > *From:* Masanz, James J. [mailto:[email protected]] > *Sent:* Saturday, May 17, 2014 12:31 PM > *To:* '[email protected]' > *Subject:* RE: Help need to integrate UMLS2013AB dictionary > > > > The tokenizer that used a file of hyphenated words was replaced with a > tokenizer that implements the Penn Treebank tokenization rules > (TokenizerPTB.java) a while back. > > So as long as you used a recent copy of CreateLuceneIndexFromDelimitedFile > which references TokenizerPTB instead of just Tokenizer, you can ignore the > part about a hypenated.txt file. > > -- James > > > > *From:* Ramprasad Reddy [mailto:[email protected]] > *Sent:* Friday, May 16, 2014 4:20 PM > *To:* [email protected] > *Subject:* Help need to integrate UMLS2013AB dictionary > > > > Hello, > > Good evening. > > I have been to trying add latest UMLS2013AB data to resources similar to > UMLS2011AB. I tried to follow the instructions in the following locations: > > - https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423 > - > > https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459 > > I already extracted the data from UMLS website and created the pipe > delimited text as well. > > But looks like there is a change in the way tokenization(there is no > hypenated.txt) is handled. > > I am facing a no main class error while running > CreateLuceneIndexFromDelimitedFile.java, and also looking for help in > creating steps to create 'umls.data' and 'umls.backup' files similar to > umls2011ab in HSQLDB > > Sharing any resources or steps to do would be very helpful. > > Thank you, > > RP. >
