Andrew,

  Wow, thank you for the detailed reply.

    I am happy with the current processing time of 5 secs to compare 400,000+ 
fingerprints, but I will look at the stack overflow discussion.  I am pretty 
well versed in MongoDB and hadn't thought about calculating it fully in MongoDB.

   I will also talk to the staff about the various fingerprints and determine 
where to go next.  The initial code was written in late 2017.  This first 
migration step was to upgrade the various libraries and packages (oBabel 2.4 to 
3.1.1).  I hadn't really thought about looking at updated features or a 
different approach.
--------------------------------------------------------------------

......
>> With this knowledge you may be able to do the Tanimoto calculation all in 
>> MongoDB, for example, by converting the "1" bit positions to an array field 
>> and using the aggregation framework.
>>
>> (These are magic words found by reading 
>> https://stackoverflow.com/questions/27805634/can-i-calculate-the-similarity-of-document-fields-using-mapreduce
>>  ; note that "Tanimoto similarity" is the domain-specific term for what the 
>> IR field generally refers to as "Jaccard similarity").

......
    There are many different fingerprint types. The ECFP fingerprints in more 
recent releases of Open Babel may be more relevant than the Daylight-like FP2 
fingerprints. This will require talking to your user base.

Chris Wolcott
chris.wolc...@nih.gov


_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to