You are correct, the information content files are specific to the sources and relations you'd like to be using. When the ic files are created, counts of terms found in your text are propagated up whatever resource you are using (following the relations you have given) so each different combination of resources and relations will give you different ic values.
And you are also correct that using the --config option is the way to specify the sources and relations. In the simplest case the config files are short text files with two main fields (SAB and REL). The following says I'd like to use SNOMEDCT with PAR, CHD relations... SAB :: include SNOMEDCT REL :: include PAR, CHD So, if this file was called snomedct.config, then you could use it like this : ted@maraca:~$ create-icfrequency.pl --config config/snomedct.config ic.out test.txt Default Settings: --term User Settings: --config config/snomedct.config CuiFinder User Options: --config option set UMLS-Interface Configuration Information Sources (SAB): SNOMEDCT Relations (REL): CHD PAR Database: umls (MMSYS-2013AA-20130404) PathFinder User Options: --realtime option set ted@maraca:~$ cat config/snomedct.config SAB :: include SNOMEDCT REL :: include PAR, CHD ted@maraca:~$ cat test.txt my diabetes is awful and I have the flu too This is the output file generated (with frequency counts). I have omitted all the 0 counts for CUIs (which is most of the concepts is SNOMEDCT). SAB :: include SNOMEDCT REL :: include PAR, CHD N :: 6 C0021400<>1 C0439068<>1 C0439135<>1 C0441913<>1 C1706104<>1 C1706368<>1 More to come... Ted On Sat, Aug 16, 2014 at 7:55 PM, Steven Bethard beth...@cis.uab.edu [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > On Aug 16, 2014, at 4:04 PM, Steven Bethard beth...@cis.uab.edu > [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > > On Jul 30, 2014, at 9:55 AM, Bridget McInnes btmcin...@gmail.com > [umls-similarity] <umls-similarity@yahoogroups.com> wrote: > >> The icpropagation files need to go into the: > >> /var/www/umls_similarity/icpropagation/ > > [snip] > >> create-icfrequency.pl ICFREQUENCY_FILE INPUTFILE > > [snip] > >> create-icpropagation.pl ICPROPAGATION_FILE ICFREQUENCY_FILE > > > > Thanks, this solved the problem. Some notes for anyone else who has to > do this: > > > > * The create-icfrequency.pl script took about 20 minutes on a text file > of about 160M words. > > * The create-icpropagation.pl script took about 10 minutes > > * The icpropagation file has to be named > /var/www/umls_similarity/icpropagation/icprop.msh.par.chd for the sever to > run > > Ok, it looks like this didn’t completely solve the problem because when I > try sources other than MSH, I get errors like: > > "Could not open file > /var/www/umls_similarity/icpropagation/icprop.fma.par.chd” > > How do I run the create-ic* scripts so that they generate all the > different icprop.* files that the server might search for? It seemed like > maybe I needed to use the --config option, but I couldn’t find the > documentation on what a config file looks like. And, assuming someone can > point me to the config file documentation, do I need to run the script once > for each combination of MSH/FMA/OMIM/SNOWMEDCT/UMLS_ALL, CUI/PAR/CHD/RB/RN? > Is there a way to make sure I have all the possible combinations? > > Steve > -- Ted Pedersen http://www.d.umn.edu/~tpederse