Dear rdkiter,
I’d like to optimize an algorithm that is slow due to substructure
searches. I am doing several millions of substructure searches using
mol1.HasSubstructurMatch(mol2).
I have hundreds of mol1s and millions of mol2s. Most of the time mol2
is not a substructure of mol1 so I was thinking to use a filter to
skip the expensive substructure search calculation when mol2 is
guaranteed not to be a substructure of mol1 such as when:
- Molecular formula of mol2 cannot be part of molecular formula
of mol1 (e.g.: C5H5N versus C6H6)
- Molecular weight of mol2 is higher than Molecular weight of mol1.
I am hoping this filter would skip many substructure searches, but
have I forgotten something else that could be used in my filter. Is
there a way to use some fingerprint ?
I can store molecular formula, RDKFingerprint, and molecular weight of
mol1s and mol2s in a dictionary so I don’t have to calculate them on
the flight. Note that I do not have enough memory available to store
all the mol2s.
Any advice would be very much appreciated.
Best,
Alexis
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss