Just an FYI:
There are updated tools/scripts[1] that will format/load Sean’s new faster 
dictionary-lookup module.
[1] http://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/

--Pei

From: Masanz, James J. [mailto:[email protected]]
Sent: Saturday, May 17, 2014 12:31 PM
To: '[email protected]'
Subject: RE: Help need to integrate UMLS2013AB dictionary

The tokenizer that used a file of hyphenated words was replaced with a 
tokenizer that implements the Penn Treebank tokenization rules 
(TokenizerPTB.java) a while back.
So as long as you used a recent copy of CreateLuceneIndexFromDelimitedFile 
which references TokenizerPTB instead of just Tokenizer, you can ignore the 
part about a hypenated.txt file.
-- James

From: Ramprasad Reddy [mailto:[email protected]]
Sent: Friday, May 16, 2014 4:20 PM
To: [email protected]
Subject: Help need to integrate UMLS2013AB dictionary

Hello,
Good evening.

I have been to trying add latest UMLS2013AB data to resources similar to 
UMLS2011AB. I tried to follow the instructions in the following locations:

  *   https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423
  *   
https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459

I already extracted the data from UMLS website and created the pipe delimited 
text as well.

But looks like there is a change in the way tokenization(there is no 
hypenated.txt) is handled.

I am facing a no main class error while running 
CreateLuceneIndexFromDelimitedFile.java, and also looking for help in creating 
steps to create 'umls.data' and 'umls.backup' files similar to umls2011ab in 
HSQLDB

Sharing any resources or steps to do would be very helpful.

Thank you,

RP.

Reply via email to