Hello, I'm trying to "recreate" a template ROMol compound by editing a RWMol using a stochastic algorithm. To guide the algorithm I would like to use Morgan fingerprints, with the goal being to reach a Tanimoto coefficient of 1 between the current molecule's fingerprint and a reference molecule's fingerprint.
The algorithm is capable of recreating the template compound (i.e. when the structures are drawn they are identical). However, the Tanimoto coefficient is far from 1. The differences in their fingerprints seem to stem from the use of different sanitization protocols, since the template ROMol is sanitized when being build while the RWMol is not. Below is a C++ code snippet to illustrate this: // Template molecule RDKit::ROMol* full_mol = RDKit::SmilesToMol("C1OCC(S1)C1=NC=NC=N1"); RDKit::SparseIntVect<unsigned>* fp1 = RDKit::MorganFingerprints::getFingerprint(*full_mol, 2); // Components to rebuild the molecule RDKit::ROMol* frag1 = RDKit::SmilesToMol("C1=NC=NC=N1"); RDKit::ROMol* frag2 = RDKit::SmilesToMol("C1CSCO1"); // Rebuilt molecule RDKit::ROMol* combined_romol = RDKit::combineMols(*frag1, *frag2); RDKit::RWMol combined_rwmol (*combined_romol); combined_rwmol.addBond(2, 7, RDKit::Bond::SINGLE); // ERROR: Invariant violation // RDKit::SparseIntVect<unsigned>* fp2 = RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2); // Barebone sanitization allows FP creation but yields wrong fingerprint RDKit::MolOps::symmetrizeSSSR(combined_rwmol); RDKit::SparseIntVect<unsigned>* fp2 = RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2); double tc = RDKit::TanimotoSimilarity(*fp1, *fp2); std::cout << "Tc with minimal sanitization: " << tc << std::endl; // Full sanitization yields the correct fingerprint RDKit::MolOps::sanitizeMol(combined_rwmol); fp2 = RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2); tc = RDKit::TanimotoSimilarity(*fp1, *fp2); std::cout << "Tc with full sanitization: " << tc << std::endl; I'm having trouble sanitizing the RWMol that is being manipulated since it has pseudoatoms (atomic number = 0), which seems to be a problem for kekulization. Currently I'm only sanitizing it the bare minimum with symmetrizeSSSR so that the fingerprint function may run. However if I only do this step the fingerprints are not equivalent. Ideally I would like to skip sanitization altogether since the molecule is constantly changing and I'm not interested in having a "chemically sound" molecule up until the very last moment. Nonetheless, I would like the similarity coefficient to reach 1. Why are fingerprints different depending on whether the molecule was sanitized or not? Is there any way to circumvent the need for sanitization perhaps by using a different kind of fingerprint? Best regards, Alan
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss