I am not sure how we set up the server with information content files, but
I suspect it was likely a fairly manual process where each combination of
resources and relations that we support via the web interface was run.
However, I didn't actually set that up so perhaps Bridget has some scripts,
etc. that she used to help her out. I'll check on that and see what we have.

More soon,
Ted


On Mon, Aug 18, 2014 at 10:49 AM, Ted Pedersen <tpede...@d.umn.edu> wrote:

> Here's some documentation on the config option. This is actually found in
> the UMLS::Interface module, which underlies a lot of UMLS::Similarity. This
> unfortunately makes it a bit hard to find, but in general most details
> about anything other than the actual similarity measure calculation are
> found in UMLS::interface.
>
> https://metacpan.org/pod/UMLS::Interface#CONFIGURATION-FILE
>
>
>
>
> On Mon, Aug 18, 2014 at 10:45 AM, Ted Pedersen <tpede...@d.umn.edu> wrote:
>
>> You are correct, the information content files are specific to the
>> sources and relations you'd like to be using. When the ic files are
>> created, counts of terms found in your text are propagated up whatever
>> resource you are using (following the relations you have given) so each
>> different combination of resources and relations will give you different ic
>> values.
>>
>> And you are also correct that using the --config option is the way to
>> specify the sources and relations. In the simplest case the config files
>> are short text files with two main fields (SAB and REL). The following says
>> I'd like to use SNOMEDCT with PAR, CHD relations...
>>
>> SAB :: include SNOMEDCT
>> REL :: include PAR, CHD
>>
>> So, if this file was called snomedct.config, then you could use it like
>> this :
>>
>> ted@maraca:~$ create-icfrequency.pl --config config/snomedct.config
>> ic.out test.txt
>> Default Settings:
>>   --term
>>
>> User Settings:
>>   --config config/snomedct.config
>>
>>
>> CuiFinder User Options:
>>    --config option set
>>
>>
>> UMLS-Interface Configuration Information
>>   Sources (SAB):
>>     SNOMEDCT
>>   Relations (REL):
>>     CHD
>>     PAR
>>   Database:
>>     umls (MMSYS-2013AA-20130404)
>>
>>
>>
>> PathFinder User Options:
>>   --realtime option set
>>
>>  ted@maraca:~$ cat config/snomedct.config
>> SAB :: include SNOMEDCT
>> REL :: include PAR, CHD
>>
>> ted@maraca:~$ cat test.txt
>> my diabetes is awful and I have the flu too
>>
>> This is the output file generated (with frequency counts). I have omitted
>> all the 0 counts for CUIs (which is most of the concepts is SNOMEDCT).
>>
>> SAB :: include SNOMEDCT
>> REL :: include PAR, CHD
>> N :: 6
>> C0021400<>1
>> C0439068<>1
>> C0439135<>1
>> C0441913<>1
>> C1706104<>1
>> C1706368<>1
>>
>> More to come...
>> Ted
>>
>>
>>
>> On Sat, Aug 16, 2014 at 7:55 PM, Steven Bethard beth...@cis.uab.edu
>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>
>>> On Aug 16, 2014, at 4:04 PM, Steven Bethard beth...@cis.uab.edu
>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>> > On Jul 30, 2014, at 9:55 AM, Bridget McInnes btmcin...@gmail.com
>>> [umls-similarity] <umls-similarity@yahoogroups.com> wrote:
>>> >> The icpropagation files need to go into the:
>>> >> /var/www/umls_similarity/icpropagation/
>>> > [snip]
>>> >> create-icfrequency.pl ICFREQUENCY_FILE INPUTFILE
>>> > [snip]
>>> >> create-icpropagation.pl ICPROPAGATION_FILE ICFREQUENCY_FILE
>>> >
>>> > Thanks, this solved the problem. Some notes for anyone else who has to
>>> do this:
>>> >
>>> > * The create-icfrequency.pl script took about 20 minutes on a text
>>> file of about 160M words.
>>> > * The create-icpropagation.pl script took about 10 minutes
>>> > * The icpropagation file has to be named
>>> /var/www/umls_similarity/icpropagation/icprop.msh.par.chd for the sever to
>>> run
>>>
>>> Ok, it looks like this didn’t completely solve the problem because when
>>> I try sources other than MSH, I get errors like:
>>>
>>> "Could not open file
>>> /var/www/umls_similarity/icpropagation/icprop.fma.par.chd”
>>>
>>> How do I run the create-ic* scripts so that they generate all the
>>> different icprop.* files that the server might search for? It seemed like
>>> maybe I needed to use the --config option, but I couldn’t find the
>>> documentation on what a config file looks like. And, assuming someone can
>>> point me to the config file documentation, do I need to run the script once
>>> for each combination of MSH/FMA/OMIM/SNOWMEDCT/UMLS_ALL, CUI/PAR/CHD/RB/RN?
>>> Is there a way to make sure I have all the possible combinations?
>>>
>>> Steve
>>>
>>
>>
>>
>> --
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>>
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse
  • [umls-similarity] ru... Steven Bethard beth...@cis.uab.edu [umls-similarity]
    • Re: [umls-simil... Bridget McInnes btmcin...@gmail.com [umls-similarity]
      • Re: [umls-s... Steven Bethard beth...@cis.uab.edu [umls-similarity]
        • Re: [um... Bridget McInnes btmcin...@gmail.com [umls-similarity]
      • Re: [umls-s... Steven Bethard beth...@cis.uab.edu [umls-similarity]
        • Re: [um... Steven Bethard beth...@cis.uab.edu [umls-similarity]
          • Re:... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
              • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
        • Re: [um... Bridget McInnes btmcin...@gmail.com [umls-similarity]
          • Re:... Steven Bethard beth...@cis.uab.edu [umls-similarity]

Reply via email to