Dear Alexis,

the concept you are describing is pretty much exactly the reason why
molecular keys / fingerprints were invented in the first place. I would
suggest to take a look at the RDKit database cartridge
(https://www.rdkit.org/docs/Cartridge.html) since that should basically
do what you want to achieve: you import your millions of structures into
a database, build an index (that consists mainly of suitable
fingerprints) and then pre-filter your substructure searches with that
index.

Hope that helps,
Nils

Am 18.08.2018 um 11:16 schrieb Alexis Parenty:
> Dear rdkiter,
> 
> I’d like to optimize an algorithm that is slow due to substructure
> searches. I am doing several millions of substructure searches using
> mol1.HasSubstructurMatch(mol2). 
> 
> I have hundreds of mol1s and millions of mol2s. Most of the time mol2 is
> not a substructure of mol1 so I was thinking to use a filter to skip the
> expensive substructure search calculation when mol2 is guaranteed not to
> be a substructure of mol1 such as when:
> 
> -        Molecular formula of mol2 cannot be part of molecular formula
> of mol1 (e.g.: C5H5N versus C6H6)
> 
> -        Molecular weight of mol2 is higher than Molecular weight of mol1.
> 
> I am hoping this filter would skip many substructure searches, but have
> I forgotten something else that could be used in my filter. Is there a
> way to use some fingerprint ?
> 
>  
> 
> I can store molecular formula, RDKFingerprint, and molecular weight of
> mol1s and mol2s in a dictionary so I don’t have to calculate them on the
> flight. Note that I do not have enough memory available to store all the
> mol2s.
> 
>  
> 
> Any advice would be very much appreciated.
> 
>  
> 
> Best,
> 
> Alexis
> 
>  
> 
>  
> 
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to