[Rdkit-discuss] optimizing substructure search

Alexis Parenty Sat, 18 Aug 2018 02:17:29 -0700

Dear rdkiter,

I’d like to optimize an algorithm that is slow due to substructure
searches. I am doing several millions of substructure searches using
mol1.HasSubstructurMatch(mol2).


I have hundreds of mol1s and millions of mol2s. Most of the time mol2
is not a substructure of mol1 so I was thinking to use a filter to
skip the expensive substructure search calculation when mol2 is
guaranteed not to be a substructure of mol1 such as when:

-        Molecular formula of mol2 cannot be part of molecular formula
of mol1 (e.g.: C5H5N versus C6H6)

-        Molecular weight of mol2 is higher than Molecular weight of mol1.

I am hoping this filter would skip many substructure searches, but
have I forgotten something else that could be used in my filter. Is
there a way to use some fingerprint ?



I can store molecular formula, RDKFingerprint, and molecular weight of
mol1s and mol2s in a dictionary so I don’t have to calculate them on
the flight. Note that I do not have enough memory available to store
all the mol2s.



Any advice would be very much appreciated.



Best,

Alexis

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] optimizing substructure search

Reply via email to