Hello,

I'm trying to "recreate" a template ROMol compound by editing a RWMol using a 
stochastic algorithm. To guide the algorithm I would like to use Morgan 
fingerprints, with the goal being to reach a Tanimoto coefficient of 1 between 
the current molecule's fingerprint and a reference molecule's fingerprint.

The algorithm is capable of recreating the template compound (i.e. when the 
structures are drawn they are identical). However, the Tanimoto coefficient is 
far from 1. The differences in their fingerprints seem to stem from the use of 
different sanitization protocols, since the template ROMol is sanitized when 
being build while the RWMol is not.

Below is a C++ code snippet to illustrate this:


  // Template molecule
  RDKit::ROMol* full_mol = RDKit::SmilesToMol("C1OCC(S1)C1=NC=NC=N1");
  RDKit::SparseIntVect<unsigned>* fp1 = 
RDKit::MorganFingerprints::getFingerprint(*full_mol, 2);

  // Components to rebuild the molecule
  RDKit::ROMol* frag1 = RDKit::SmilesToMol("C1=NC=NC=N1");
  RDKit::ROMol* frag2 = RDKit::SmilesToMol("C1CSCO1");

  // Rebuilt molecule
  RDKit::ROMol* combined_romol = RDKit::combineMols(*frag1, *frag2);
  RDKit::RWMol combined_rwmol (*combined_romol);
  combined_rwmol.addBond(2, 7, RDKit::Bond::SINGLE);

  // ERROR: Invariant violation
  // RDKit::SparseIntVect<unsigned>* fp2 = 
RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2);

  // Barebone sanitization allows FP creation but yields wrong fingerprint
  RDKit::MolOps::symmetrizeSSSR(combined_rwmol);
  RDKit::SparseIntVect<unsigned>* fp2 = 
RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2);
  double tc = RDKit::TanimotoSimilarity(*fp1, *fp2);
  std::cout << "Tc with minimal sanitization: " << tc << std::endl;

  // Full sanitization yields the correct fingerprint
  RDKit::MolOps::sanitizeMol(combined_rwmol);
  fp2 = RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2);
  tc = RDKit::TanimotoSimilarity(*fp1, *fp2);
  std::cout << "Tc with full sanitization: " << tc << std::endl;


I'm having trouble sanitizing the RWMol that is being manipulated since it has 
pseudoatoms (atomic number = 0), which seems to be a problem for kekulization. 
Currently I'm only sanitizing it the bare minimum with symmetrizeSSSR so that 
the fingerprint function may run. However if I only do this step the 
fingerprints are not equivalent. Ideally I would like to skip sanitization 
altogether since the molecule is constantly changing and I'm not interested in 
having a "chemically sound" molecule up until the very last moment. 
Nonetheless, I would like the similarity coefficient to reach 1.

Why are fingerprints different depending on whether the molecule was sanitized 
or not? Is there any way to circumvent the need for sanitization perhaps by 
using a different kind of fingerprint?

Best regards,

Alan
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to