You are correct, the information content files are specific to the sources
and relations you'd like to be using. When the ic files are created, counts
of terms found in your text are propagated up whatever resource you are
using (following the relations you have given) so each different
combination of resources and relations will give you different ic values.

And you are also correct that using the --config option is the way to
specify the sources and relations. In the simplest case the config files
are short text files with two main fields (SAB and REL). The following says
I'd like to use SNOMEDCT with PAR, CHD relations...

SAB :: include SNOMEDCT
REL :: include PAR, CHD

So, if this file was called snomedct.config, then you could use it like
this :

ted@maraca:~$ create-icfrequency.pl --config config/snomedct.config ic.out
test.txt
Default Settings:
  --term

User Settings:
  --config config/snomedct.config


CuiFinder User Options:
   --config option set


UMLS-Interface Configuration Information
  Sources (SAB):
    SNOMEDCT
  Relations (REL):
    CHD
    PAR
  Database:
    umls (MMSYS-2013AA-20130404)



PathFinder User Options:
  --realtime option set

 ted@maraca:~$ cat config/snomedct.config
SAB :: include SNOMEDCT
REL :: include PAR, CHD

ted@maraca:~$ cat test.txt
my diabetes is awful and I have the flu too

This is the output file generated (with frequency counts). I have omitted
all the 0 counts for CUIs (which is most of the concepts is SNOMEDCT).

SAB :: include SNOMEDCT
REL :: include PAR, CHD
N :: 6
C0021400<>1
C0439068<>1
C0439135<>1
C0441913<>1
C1706104<>1
C1706368<>1

More to come...
Ted



On Sat, Aug 16, 2014 at 7:55 PM, Steven Bethard beth...@cis.uab.edu
[umls-similarity] <umls-similarity@yahoogroups.com> wrote:

> On Aug 16, 2014, at 4:04 PM, Steven Bethard beth...@cis.uab.edu
> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
> > On Jul 30, 2014, at 9:55 AM, Bridget McInnes btmcin...@gmail.com
> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
> >> The icpropagation files need to go into the:
> >> /var/www/umls_similarity/icpropagation/
> > [snip]
> >> create-icfrequency.pl ICFREQUENCY_FILE INPUTFILE
> > [snip]
> >> create-icpropagation.pl ICPROPAGATION_FILE ICFREQUENCY_FILE
> >
> > Thanks, this solved the problem. Some notes for anyone else who has to
> do this:
> >
> > * The create-icfrequency.pl script took about 20 minutes on a text file
> of about 160M words.
> > * The create-icpropagation.pl script took about 10 minutes
> > * The icpropagation file has to be named
> /var/www/umls_similarity/icpropagation/icprop.msh.par.chd for the sever to
> run
>
> Ok, it looks like this didn’t completely solve the problem because when I
> try sources other than MSH, I get errors like:
>
> "Could not open file
> /var/www/umls_similarity/icpropagation/icprop.fma.par.chd”
>
> How do I run the create-ic* scripts so that they generate all the
> different icprop.* files that the server might search for? It seemed like
> maybe I needed to use the --config option, but I couldn’t find the
> documentation on what a config file looks like. And, assuming someone can
> point me to the config file documentation, do I need to run the script once
> for each combination of MSH/FMA/OMIM/SNOWMEDCT/UMLS_ALL, CUI/PAR/CHD/RB/RN?
> Is there a way to make sure I have all the possible combinations?
>
> Steve
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse
  • [umls-similarity] ru... Steven Bethard beth...@cis.uab.edu [umls-similarity]
    • Re: [umls-simil... Bridget McInnes btmcin...@gmail.com [umls-similarity]
      • Re: [umls-s... Steven Bethard beth...@cis.uab.edu [umls-similarity]
        • Re: [um... Bridget McInnes btmcin...@gmail.com [umls-similarity]
      • Re: [umls-s... Steven Bethard beth...@cis.uab.edu [umls-similarity]
        • Re: [um... Steven Bethard beth...@cis.uab.edu [umls-similarity]
          • Re:... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re: [um... Bridget McInnes btmcin...@gmail.com [umls-similarity]
          • Re:... Steven Bethard beth...@cis.uab.edu [umls-similarity]

Reply via email to