Hi again,

With less UMLS::Similarity objects it speeded up. I was creating 14 objects, and now that I'm just working with 3 (UMLS::Similarity::wup, UMLS::Similarity::lin, UMLS::Similarity::batet) the speed is much much higher, about 10,000 comparisons per minute.

Now I'm trying to create a bigger index with several sources (HPO,ICD9CM,MSH,MTH,MTHICD9,NCI,OMIM) instead of just MeSH, and I'm having a performance issue. After 6 days of processing, I have this:

hex     TABLENAME       TABLE_ROWS in Millions  Data Base Size in MB
a3fb2fd960e5ccdfd7901543d1b081a7f8f58fca2 MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_cache 0 0.00097656 ac7ef1dade78ec5131826b9395e60ca2f734b58b2 MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_child 0.036221 0.5882082 a3f8a21d616f3e6024b0c692b1c923bd3dcc5ec77 MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_info 0 0.00097656 a31e6e52e6ddc9735506c3a34daa4715a4c8bb759 MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_intrinsic 0.353009 9.92724705 a05d031944a25b267d67b2f1be368fb4845993b9d MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_parent 0.036221 0.5882082 a04e21d270bcd16e6d7eded6cd6acbefdce193a61 MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_table 1242.790321 500697.10636902


As you can see, the "table" table is huge and still growing. Intrinsic table is growing too, but at a lower speed. Parent and child tables stay stable at 36,221 rows. My questions are:
1. What is stored in "table"? all possible paths?
I have seen that there exists contradictory relationships between some CUIs, i.e. a CUI1 can be parent and child at the same time of a CUI2 depending on the source. For instance:

CUI1    CUI2    diseaseName     diseaseName     a_rel   a_sab   b_rel   b_sab
C2607914 C0018621 Allergic rhinitis (disorder) Hay fever RB MTH CHD NCI C2607914 C0018621 Allergic rhinitis (disorder) Hay fever RB MTH CHD MSH C2607914 C0018621 Allergic rhinitis (disorder) Hay fever RB MTH CHD ICD9CM


or this other case, where we can see how ICD9CM contains cyclic relationships:

CUI1    CUI2    diseaseName     diseaseName     a_rel   a_sab   b_rel   b_sab
C0751362 C0027404 Narcolepsy-Cataplexy Syndrome Narcolepsy PAR ICD9CM CHD ICD9CM C0751362 C0027404 Narcolepsy-Cataplexy Syndrome Narcolepsy RB MTH CHD ICD9CM


2. Since it worked with just 1 source (MESH), I wonder if these contradictory relationships can mess up the index creation and path calculations. How is UMLS::Interface resolving these relationship conflicts?

Thanks,
Emilio

On 09/13/2016 03:04 PM, Emilio Centeno Ortiz wrote:
I also noticed that tables in umlsinterfaceindex are being modified while my program is calculating res (Resnik). Maybe I understood it wrong, I thought that umlsinterfaceindex where computed and written just once, and from then on umlsinterfaceindex was read for future queries. Is the software behaving as intended?

Thank you in advance,
Emilio

On 09/13/2016 10:51 AM, Emilio Centeno Ortiz ecent...@imim.es [umls-similarity] wrote:

We finally quit from computing the index for all the UMLS sources. We restricted the analysis to MeSH and PAR/CHD and BR/NR relationships, and the indexing part worked fine.

I'm now trying to compute some similarity measures but it takes a lot of time each calculation (2-17min). So, I tried to do this tunning https://www.nlm.nih.gov/research/umls/implementation_resources/scripts/README_RRF_MySQL_Output_Stream.html

in my MySQL server, as suggested in the forum
https://www.mail-archive.com/umls-similarity@yahoogroups.com/msg00172.html

After the MySQL tunning the computing speed did not improve.

This is the content of my config file:

SAB :: include MSH
REL :: include RB, RN, PAR, CHD

and this is the piece of code that I'm running:

     my %params = ();
     $params{"database"} = "umls_2015AB";
     $params{"username"} = "emilio";
     $params{"password"} = mypassword;
     $params{"hostname"} = "localhost";
     $params{"port"} = "3306";
     $params{"config"} = 
"/home/emilio/workspace/umls_relatedness/test_config_umls-interface-package.cfg";
     $params{"intrinsic"} = "sanchez";
my $umls = UMLS::Interface->new(\%params);
     my $lch = UMLS::Similarity::lch->new($umls,\%params);
     my $lchValue = $lch->getRelatedness("C0018814", "C0003113");

I already created the umlsinterfaceindex for MSH, so I'm not running it in realtime mode.

Since I need to compute many similarity measures in a lot of CUIs (~14,000), I wonder if I'm doing something wrong or, if not, can I do something to speed up the process.

Emilio


On 08/30/2016 06:53 PM, Albert Max Lai albert.max....@gmail.com [umls-similarity] wrote:
This sounds a little like what I ran into a while back (https://www.mail-archive.com/umls-similarity@yahoogroups.com/msg00353.html). I would make sure that the UMLS user has the INDEX privilege on the umlsinterfaceindex. The user account needs to have SELECT, INSERT, DELETE, CREATE, DROP, and INDEX privileges.

Without the INDEX privilege, it seemed like the index database just kept getting bigger and bigger.

-Albert


On Aug 30, 2016, at 9:18 AM, Ted Pedersen duluth...@gmail.com <mailto:duluth...@gmail.com> [umls-similarity] <umls-similarity@yahoogroups.com <mailto:umls-similarity@yahoogroups.com>> wrote:

Building an index for all the UMLS sources can be time consuming. I don't know that I've ever actually even done that, since some of the sources aren't probably really going to be relevant (depending on the nature of your data and experiments). Three weeks isn't totally surprisingly, although it seems long. I think I might suggest building up to 44 sources, and perhaps start with 1 or 2, see how long goes, and then try 5 or 6, etc. so that you have a sense of how long more sources should take. Also, you may want to be more selective as to which sources you include, just to keep things more efficient. I hope this helps!

Good luck,
Ted

On Tue, Aug 30, 2016 at 6:06 AM, Emilio Centeno Ortiz ecent...@imim.es <mailto:ecent...@imim.es> [umls-similarity] <umls-similarity@ yahoogroups.com <mailto:umls-similarity@yahoogroups.com>> wrote:

    Hi Ted,

    We included all UMLS sources (44) in the configuration file,
    and the indexation process has been running for 3 weeks (still
    running). The umlsinterfaceindex database is currently using
    117G of disk space.

    We included all these sources because, once the indexing is
    over, we want to test some similarity measures between concepts
    of the same semantic type for different semantic types (which
    implies different sources)  i.e. compare diseases with
    diseases, compounds with compounds, etc. This way we do the
    indexing just once, and we can do any comparison we want
    without having to rebuild the umlsinterfaceindex dabatase
    everytime we switch sources

    Did I wrong including too many sources? Is the package going to
    work with so many sources?

    Thanks in advance,

    Emilio



    On 08/08/2016 04:44 AM, Ted Pedersen duluth...@gmail.com
    <mailto:duluth...@gmail.com> [umls-similarity] wrote:
    The time to build an index really varies quite a bit, and
    depends both on the hardware you have available and the
    particular sources and relations you are using. If possible be
    patient and let the index build finish, since that really does
    speed up queries. And I hope your index has already been built
    by the time you read this!

    Good luck,
    Ted

    On Thu, Aug 4, 2016 at 8:57 AM, Emilio Centeno Ortiz
    ecent...@imim.es <mailto:ecent...@imim.es> [umls-similarity]
    <umls-similarity@yahoogroups.com
    <mailto:umls-similarity@yahoogroups.com>> wrote:

        Hi Ted,

        Thanks for the reply. I finally could make it work.

        I didn't know that the UMLS Semantic Network was necessary
        in the initialization of the UMLS::Interface object. Since
        we don't have the Semantic Network for umls_2016AA I just
        switched to umls_2015AB. Now it is working.

        Now that I'm running a test, the package is creating an
        index database that, in the end, it will save time in
        future queries. How long this will take?

        I hope I will ask about more interesting issues shortly :)

        Thanks again,

        Emilio




        On 08/04/2016 03:30 PM, Ted Pedersen duluth...@gmail.com
        <mailto:duluth...@gmail.com> [umls-similarity] wrote:
        Hi Emilio,

        I'm afraid I haven't seen this error before. It looks
        like a fairly generic Perl DBI error, so I wonder if
        everything is working ok with that module? If possible it
        might be good to run the DBI tests again just to make
        sure that is installed and working ok. Please let us know
        what you find if you are able to do that...

        Sorry I can't be more specific, but keep us posted and we
        might be able to do more...

        Good luck,
        Ted

        On Tue, Aug 2, 2016 at 2:21 AM, Emilio Centeno Ortiz
        ecent...@imim.es <mailto:ecent...@imim.es>
        [umls-similarity] <umls-similarity@yahoogroups.com
        <mailto:umls-similarity@yahoogroups.com>> wrote:

            Hi again,

            Now that our sysadmin granted a full working access
            to the UMLS database, I'm trying to create the
            UMLS::Interface object:

            my $umls = UMLS::Interface->new({"driver" => "mysql",

                     "database" => "umls_2016AA",

                     "username" => "myusername",

                     "password" => "mypassword",

                     "hostname" => "localhost",

                     "port" => "3306"

                 });


            But I get this output:

            UMLS-Interface Configuration Information:

            (Default Information - no config file)

               Sources (SAB):

                  MSH

               Relations (REL):

                  PAR

                  CHD

               Sources (SABDEF):

                  UMLS_ALL

               Relations (RELDEF):

                  UMLS_ALL

            ERROR: UMLS::Interface::STFinder->_loadSemanticNetwork

            Database error (Error Code 1).

            Error executing database query: DBI::st=HASH(0x3b643e8)->errstr()).


            Any hint about this?

            Thanks in advance,

            Emilio

            On 07/30/2016 03:18 AM, juliana md
            julian...@gmail.com <mailto:julian...@gmail.com>
            [umls-similarity] wrote:

            Hi Emilio,

            Did you grant access to myusername@mylocalmachine to
            your umls_2016AA database?
            Are you able to connect to your database from
            mylocalmachine (by using workbench for example)
            using those credentials?

            Regards,
            Juliana

            Em 29 de jul de 2016 21:05, "Emilio Centeno Ortiz
            ecent...@imim.es <mailto:ecent...@imim.es>
            [umls-similarity]" <umls-similarity@yahoogroups.com
            <mailto:umls-similarity@yahoogroups.com>> escreveu:

                Hello,

                I have just installed the UMLS::Similarity
                package and copy-pasted the example code Since
                the MySQL is hosted in another machine
                (172.20.16.15) I tried to initialize the
                interface with our connection parameters:

                use UMLS::Interface;
                use UMLS::Similarity::lch;
                use UMLS::Similarity::path;
$umls = UMLS::Interface->new({"driver" => "mysql",
                                                  "database" => "umls_2016AA",
                                                  "username" => "myusername",
                                                  "password" => "mypassword",
                                                  "hostname" => "myMySQLHostIP",
                                                  "port" => "3306"});

                die "Unable to create UMLS::Interface object.\n" if(!$umls);
my $lch = UMLS::Similarity::lch->new($umls);
                die "Unable to create measure object.\n" if(!$lch);
my $path = UMLS::Similarity::path->new($umls);
                die "Unable to create measure object.\n" if(!$path);
my $cui1 = "C0005767";
                my $cui2 = "C0007634";
$ts1 = $umls->getTermList($cui1);
                my $term1 = pop @{$ts1};
$ts2 = $umls->getTermList($cui2);
                my $term2 = pop @{$ts2};
my $lvalue = $lch->getRelatedness($cui1, $cui2); my $pvalue = $path->getRelatedness($cui1, $cui2); print "The lch similarity between $cui1 ($term1) and $cui2 ($term2) is $lvalue\n"; print "The path similarity between $cui1 ($term1) and $cui2 ($term2) is $pvalue\n";


                but it complains like that:

                DBI 
connect('database=umls_2016AA;mysql_socket=/var/run/mysqld/mysqld.sock;host=myMySQLHostIP','myusername',...)
 failed: Access denied for user 'myusername'@'mylocalmachinename' (using 
password: YES) at 
/soft/devel/perl-5.16.3/lib/site_perl/5.16.3/UMLS/Interface/CuiFinder.pm line 
2458.

                Can't call method "err" on an undefined value at 
/soft/devel/perl-5.16.3/lib/site_perl/5.16.3/UMLS/Interface/ErrorHandler.pm line 113.


                I have just replaced hostnames, user, etc. with
                "my..." names.
                It looks like it tries to connect to MySQL using
                sockets? Any advice about how I could overcome
                this issue?

                Thanks in advance,
                Emilio

-- Emilio Centeno Ortiz

                Research Programme on Biomedical Informatics (GRIB)
                Department of Experimental and Health Sciences
                Universitat Pompeu Fabra
                IMIM (Hospital del Mar Medical Research Institute)
                C/ Dr. Aiguader, 88
                Barcelona, Spain
                Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
                E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
                http://ibi.imim.es


-- Emilio Centeno Ortiz

            Research Programme on Biomedical Informatics (GRIB)
            Department of Experimental and Health Sciences
            Universitat Pompeu Fabra
            IMIM (Hospital del Mar Medical Research Institute)
            C/ Dr. Aiguader, 88
            Barcelona, Spain
            Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
            E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
            http://ibi.imim.es



-- Emilio Centeno Ortiz

        Research Programme on Biomedical Informatics (GRIB)
        Department of Experimental and Health Sciences
        Universitat Pompeu Fabra
        IMIM (Hospital del Mar Medical Research Institute)
        C/ Dr. Aiguader, 88
        Barcelona, Spain
        Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
        E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
        http://ibi.imim.es



-- Emilio Centeno Ortiz

    Research Programme on Biomedical Informatics (GRIB)
    Department of Experimental and Health Sciences
    Universitat Pompeu Fabra
    IMIM (Hospital del Mar Medical Research Institute)
    C/ Dr. Aiguader, 88
    Barcelona, Spain
    Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
    E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
    http://ibi.imim.es



--
Emilio Centeno Ortiz

Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es


--
Emilio Centeno Ortiz

Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es

--
Emilio Centeno Ortiz

Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
  • Re: [umls-simila... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]
    • Re: [umls-s... Ted Pedersen duluth...@gmail.com [umls-similarity]
      • Re: [um... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]
        • Re:... Ted Pedersen duluth...@gmail.com [umls-similarity]
          • ... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]
            • ... Ted Pedersen duluth...@gmail.com [umls-similarity]
            • ... Albert Max Lai albert.max....@gmail.com [umls-similarity]
            • ... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]
            • ... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]
            • ... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]
            • ... Emilio Centeno Ortiz ecent...@imim.es [umls-similarity]

Reply via email to