[umls-similarity] What is the most efficient/scalable way to compute similarity and relatedness on a big dataset?

yonatanbitt...@gmail.com [umls-similarity] Tue, 27 Aug 2019 14:20:28 -0700

Hello. I've succeeded in activating umls-similarity.pl from the shell. 
 

 I want to use and cite this project in a research paper I'm writing, so I want 
to compute similarity and relatedness on a big dataset that I have (can get to 
> million pairs)


 The current problem is that activating it from the shell taking too long (even 
after building the index at the first run)
 

 My questions are:
 1. What is the best way to calculate similarity and relatedness on a big 
dataset? 
 2. Is it possible to compute similarity and relatedness on a big dataset? 
 3. Does anyone have code examples for performing this job or similar job? For 
example calculating similarity and relatedness on pairs of words in some format 
(csv, txt, xlsx, json, etc..)... I'm not familiar with perl but I'll learn 
whatever needed in order to activate this code.
 

 Thanks a-lot!

[umls-similarity] What is the most efficient/scalable way to compute similarity and relatedness on a big dataset?

Reply via email to