Hi again,
With less UMLS::Similarity objects it speeded up. I was creating 14
objects, and now that I'm just working with 3 (UMLS::Similarity::wup,
UMLS::Similarity::lin, UMLS::Similarity::batet) the speed is much much
higher, about 10,000 comparisons per minute.
Now I'm trying to create a bigger index with several sources
(HPO,ICD9CM,MSH,MTH,MTHICD9,NCI,OMIM) instead of just MeSH, and I'm
having a performance issue. After 6 days of processing, I have this:
hex TABLENAME TABLE_ROWS in Millions Data Base Size in MB
a3fb2fd960e5ccdfd7901543d1b081a7f8f58fca2
MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_cache
0 0.00097656
ac7ef1dade78ec5131826b9395e60ca2f734b58b2
MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_child
0.036221 0.5882082
a3f8a21d616f3e6024b0c692b1c923bd3dcc5ec77
MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_info
0 0.00097656
a31e6e52e6ddc9735506c3a34daa4715a4c8bb759
MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_intrinsic
0.353009 9.92724705
a05d031944a25b267d67b2f1be368fb4845993b9d
MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_parent
0.036221 0.5882082
a04e21d270bcd16e6d7eded6cd6acbefdce193a61
MMSYS_2015AB_20151109_HPO_ICD9CM_MSH_MTH_MTHICD9_NCI_OMIM_CHD_PAR_RB_RN_table
1242.790321 500697.10636902
As you can see, the "table" table is huge and still growing. Intrinsic
table is growing too, but at a lower speed. Parent and child tables stay
stable at 36,221 rows. My questions are:
1. What is stored in "table"? all possible paths?
I have seen that there exists contradictory relationships between some
CUIs, i.e. a CUI1 can be parent and child at the same time of a CUI2
depending on the source. For instance:
CUI1 CUI2 diseaseName diseaseName a_rel a_sab b_rel b_sab
C2607914 C0018621 Allergic rhinitis (disorder) Hay fever RB MTH
CHD NCI
C2607914 C0018621 Allergic rhinitis (disorder) Hay fever RB MTH
CHD MSH
C2607914 C0018621 Allergic rhinitis (disorder) Hay fever RB MTH
CHD ICD9CM
or this other case, where we can see how ICD9CM contains cyclic
relationships:
CUI1 CUI2 diseaseName diseaseName a_rel a_sab b_rel b_sab
C0751362 C0027404 Narcolepsy-Cataplexy Syndrome Narcolepsy PAR
ICD9CM CHD ICD9CM
C0751362 C0027404 Narcolepsy-Cataplexy Syndrome Narcolepsy RB
MTH CHD ICD9CM
2. Since it worked with just 1 source (MESH), I wonder if these
contradictory relationships can mess up the index creation and path
calculations. How is UMLS::Interface resolving these relationship
conflicts?
Thanks,
Emilio
On 09/13/2016 03:04 PM, Emilio Centeno Ortiz wrote:
I also noticed that tables in umlsinterfaceindex are being modified
while my program is calculating res (Resnik). Maybe I understood it
wrong, I thought that umlsinterfaceindex where computed and written
just once, and from then on umlsinterfaceindex was read for future
queries. Is the software behaving as intended?
Thank you in advance,
Emilio
On 09/13/2016 10:51 AM, Emilio Centeno Ortiz ecent...@imim.es
[umls-similarity] wrote:
We finally quit from computing the index for all the UMLS sources. We
restricted the analysis to MeSH and PAR/CHD and BR/NR relationships,
and the indexing part worked fine.
I'm now trying to compute some similarity measures but it takes a lot
of time each calculation (2-17min). So, I tried to do this tunning
https://www.nlm.nih.gov/research/umls/implementation_resources/scripts/README_RRF_MySQL_Output_Stream.html
in my MySQL server, as suggested in the forum
https://www.mail-archive.com/umls-similarity@yahoogroups.com/msg00172.html
After the MySQL tunning the computing speed did not improve.
This is the content of my config file:
SAB :: include MSH
REL :: include RB, RN, PAR, CHD
and this is the piece of code that I'm running:
my %params = ();
$params{"database"} = "umls_2015AB";
$params{"username"} = "emilio";
$params{"password"} = mypassword;
$params{"hostname"} = "localhost";
$params{"port"} = "3306";
$params{"config"} =
"/home/emilio/workspace/umls_relatedness/test_config_umls-interface-package.cfg";
$params{"intrinsic"} = "sanchez";
my $umls = UMLS::Interface->new(\%params);
my $lch = UMLS::Similarity::lch->new($umls,\%params);
my $lchValue = $lch->getRelatedness("C0018814", "C0003113");
I already created the umlsinterfaceindex for MSH, so I'm not running
it in realtime mode.
Since I need to compute many similarity measures in a lot of CUIs
(~14,000), I wonder if I'm doing something wrong or, if not, can I do
something to speed up the process.
Emilio
On 08/30/2016 06:53 PM, Albert Max Lai albert.max....@gmail.com
[umls-similarity] wrote:
This sounds a little like what I ran into a while back
(https://www.mail-archive.com/umls-similarity@yahoogroups.com/msg00353.html).
I would make sure that the UMLS user has the INDEX privilege on the
umlsinterfaceindex. The user account needs to have SELECT, INSERT,
DELETE, CREATE, DROP, and INDEX privileges.
Without the INDEX privilege, it seemed like the index database just
kept getting bigger and bigger.
-Albert
On Aug 30, 2016, at 9:18 AM, Ted Pedersen duluth...@gmail.com
<mailto:duluth...@gmail.com> [umls-similarity]
<umls-similarity@yahoogroups.com
<mailto:umls-similarity@yahoogroups.com>> wrote:
Building an index for all the UMLS sources can be time consuming. I
don't know that I've ever actually even done that, since some of
the sources aren't probably really going to be relevant (depending
on the nature of your data and experiments). Three weeks isn't
totally surprisingly, although it seems long. I think I might
suggest building up to 44 sources, and perhaps start with 1 or 2,
see how long goes, and then try 5 or 6, etc. so that you have a
sense of how long more sources should take. Also, you may want to
be more selective as to which sources you include, just to keep
things more efficient. I hope this helps!
Good luck,
Ted
On Tue, Aug 30, 2016 at 6:06 AM, Emilio Centeno Ortiz
ecent...@imim.es <mailto:ecent...@imim.es> [umls-similarity]
<umls-similarity@ yahoogroups.com
<mailto:umls-similarity@yahoogroups.com>> wrote:
Hi Ted,
We included all UMLS sources (44) in the configuration file,
and the indexation process has been running for 3 weeks (still
running). The umlsinterfaceindex database is currently using
117G of disk space.
We included all these sources because, once the indexing is
over, we want to test some similarity measures between concepts
of the same semantic type for different semantic types (which
implies different sources) i.e. compare diseases with
diseases, compounds with compounds, etc. This way we do the
indexing just once, and we can do any comparison we want
without having to rebuild the umlsinterfaceindex dabatase
everytime we switch sources
Did I wrong including too many sources? Is the package going to
work with so many sources?
Thanks in advance,
Emilio
On 08/08/2016 04:44 AM, Ted Pedersen duluth...@gmail.com
<mailto:duluth...@gmail.com> [umls-similarity] wrote:
The time to build an index really varies quite a bit, and
depends both on the hardware you have available and the
particular sources and relations you are using. If possible be
patient and let the index build finish, since that really does
speed up queries. And I hope your index has already been built
by the time you read this!
Good luck,
Ted
On Thu, Aug 4, 2016 at 8:57 AM, Emilio Centeno Ortiz
ecent...@imim.es <mailto:ecent...@imim.es> [umls-similarity]
<umls-similarity@yahoogroups.com
<mailto:umls-similarity@yahoogroups.com>> wrote:
Hi Ted,
Thanks for the reply. I finally could make it work.
I didn't know that the UMLS Semantic Network was necessary
in the initialization of the UMLS::Interface object. Since
we don't have the Semantic Network for umls_2016AA I just
switched to umls_2015AB. Now it is working.
Now that I'm running a test, the package is creating an
index database that, in the end, it will save time in
future queries. How long this will take?
I hope I will ask about more interesting issues shortly :)
Thanks again,
Emilio
On 08/04/2016 03:30 PM, Ted Pedersen duluth...@gmail.com
<mailto:duluth...@gmail.com> [umls-similarity] wrote:
Hi Emilio,
I'm afraid I haven't seen this error before. It looks
like a fairly generic Perl DBI error, so I wonder if
everything is working ok with that module? If possible it
might be good to run the DBI tests again just to make
sure that is installed and working ok. Please let us know
what you find if you are able to do that...
Sorry I can't be more specific, but keep us posted and we
might be able to do more...
Good luck,
Ted
On Tue, Aug 2, 2016 at 2:21 AM, Emilio Centeno Ortiz
ecent...@imim.es <mailto:ecent...@imim.es>
[umls-similarity] <umls-similarity@yahoogroups.com
<mailto:umls-similarity@yahoogroups.com>> wrote:
Hi again,
Now that our sysadmin granted a full working access
to the UMLS database, I'm trying to create the
UMLS::Interface object:
my $umls = UMLS::Interface->new({"driver" => "mysql",
"database" => "umls_2016AA",
"username" => "myusername",
"password" => "mypassword",
"hostname" => "localhost",
"port" => "3306"
});
But I get this output:
UMLS-Interface Configuration Information:
(Default Information - no config file)
Sources (SAB):
MSH
Relations (REL):
PAR
CHD
Sources (SABDEF):
UMLS_ALL
Relations (RELDEF):
UMLS_ALL
ERROR: UMLS::Interface::STFinder->_loadSemanticNetwork
Database error (Error Code 1).
Error executing database query: DBI::st=HASH(0x3b643e8)->errstr()).
Any hint about this?
Thanks in advance,
Emilio
On 07/30/2016 03:18 AM, juliana md
julian...@gmail.com <mailto:julian...@gmail.com>
[umls-similarity] wrote:
Hi Emilio,
Did you grant access to myusername@mylocalmachine to
your umls_2016AA database?
Are you able to connect to your database from
mylocalmachine (by using workbench for example)
using those credentials?
Regards,
Juliana
Em 29 de jul de 2016 21:05, "Emilio Centeno Ortiz
ecent...@imim.es <mailto:ecent...@imim.es>
[umls-similarity]" <umls-similarity@yahoogroups.com
<mailto:umls-similarity@yahoogroups.com>> escreveu:
Hello,
I have just installed the UMLS::Similarity
package and copy-pasted the example code Since
the MySQL is hosted in another machine
(172.20.16.15) I tried to initialize the
interface with our connection parameters:
use UMLS::Interface;
use UMLS::Similarity::lch;
use UMLS::Similarity::path;
$umls = UMLS::Interface->new({"driver" => "mysql",
"database" => "umls_2016AA",
"username" => "myusername",
"password" => "mypassword",
"hostname" => "myMySQLHostIP",
"port" => "3306"});
die "Unable to create UMLS::Interface object.\n" if(!$umls);
my $lch = UMLS::Similarity::lch->new($umls);
die "Unable to create measure object.\n" if(!$lch);
my $path = UMLS::Similarity::path->new($umls);
die "Unable to create measure object.\n" if(!$path);
my $cui1 = "C0005767";
my $cui2 = "C0007634";
$ts1 = $umls->getTermList($cui1);
my $term1 = pop @{$ts1};
$ts2 = $umls->getTermList($cui2);
my $term2 = pop @{$ts2};
my $lvalue = $lch->getRelatedness($cui1, $cui2);
my $pvalue = $path->getRelatedness($cui1, $cui2);
print "The lch similarity between $cui1 ($term1) and $cui2 ($term2) is $lvalue\n";
print "The path similarity between $cui1 ($term1) and $cui2 ($term2) is $pvalue\n";
but it complains like that:
DBI
connect('database=umls_2016AA;mysql_socket=/var/run/mysqld/mysqld.sock;host=myMySQLHostIP','myusername',...)
failed: Access denied for user 'myusername'@'mylocalmachinename' (using
password: YES) at
/soft/devel/perl-5.16.3/lib/site_perl/5.16.3/UMLS/Interface/CuiFinder.pm line
2458.
Can't call method "err" on an undefined value at
/soft/devel/perl-5.16.3/lib/site_perl/5.16.3/UMLS/Interface/ErrorHandler.pm line 113.
I have just replaced hostnames, user, etc. with
"my..." names.
It looks like it tries to connect to MySQL using
sockets? Any advice about how I could overcome
this issue?
Thanks in advance,
Emilio
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536 <tel:%2B34%2093%20316%200536>
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es
--
Emilio Centeno Ortiz
Research Programme on Biomedical Informatics (GRIB)
Department of Experimental and Health Sciences
Universitat Pompeu Fabra
IMIM (Hospital del Mar Medical Research Institute)
C/ Dr. Aiguader, 88
Barcelona, Spain
Tel.: +34 93 316 0536
E-mail: ecent...@imim.es <mailto:ecent...@imim.es>
http://ibi.imim.es