Dear Igor,
On Sat, May 24, 2008 at 8:57 AM, Igor Filippov [Contr]
<[email protected]> wrote:
> I noticed the following peculiarities - not sure if it's a bug or a
> feature.
> 1) When assembling a fragment like this CN(=O)=O (a nitro group),
> mol->addAtom(new Atom(7));
> mol->addAtom(new Atom(8));
> mol->addAtom(new Atom(8));
> mol->addAtom(new Atom(6));
> mol->addBond(0,1,Bond::DOUBLE);
> mol->addBond(0,2,Bond::DOUBLE);
> mol->addBond(0,3,Bond::SINGLE);
>
> it automatically gets converted to C[N+](=O)[O-]
> Not exactly what I have entered, though equivalent?
That's intentional behavior. The C[N+](=O)[O-] form is the one the
RDKit uses internally; it's the happy Lewis-dot form. In general the
RDKit doesn't attempt to fix "bad" structure drawings: if you try to
sanitize a structure with 5-valent neutral nitrogen you get an error.
There are a couple exceptions to this rule:
1) The substructure N=O with neutral 5-valent N is converted to
[N+][O-] in order to handle nitro, n-oxide, etc.
2) Perchlorate is converted from Cl(=O)(=O)(=O)[O-] to
[Cl+3]([O-])([O-])([O-])[O-]
The code for this is in $RDBASE/Code/GraphMol/MolOps.cpp:cleanUp().
> 2) If I designate a bond as aromatic and it's not in a ring, the
> Sanitization procedure throws an exception - not a desired behavior for
> me, as I would like to have an opportunity to clear up AROMATIC flag
> from non-ring bonds (if it gets there by mistake), but I cannot perceive
> ring bonds before sanitization. So it's somewhat like chicken-and-eggs
> problem.
> [02:42:01] Kekulization somehow did not convert bond 2
> terminate called after throwing an instance of
> 'RDKit::MolSanitizeException'
> what(): N5RDKit20MolSanitizeExceptionE
> Aborted
You could do the ring perception first, then fix the non-ring aromatic
bonds, then sanitize. The ring perception doesn't need a "clean"
molecule.
Here's some sample code for this approach (hopefully it doesn't get
completely munged in the email):
------------------
void CleanupMolecule(){
// build: C1CC1C(:O):O
RWMol *mol=new RWMol();
// add atoms and bonds:
mol->addAtom(new Atom(6)); // atom 0
mol->addAtom(new Atom(6)); // atom 1
mol->addAtom(new Atom(6)); // atom 2
mol->addAtom(new Atom(6)); // atom 3
mol->addAtom(new Atom(8)); // atom 4
mol->addAtom(new Atom(8)); // atom 5
mol->addBond(3,4,Bond::AROMATIC); // bond 0
mol->addBond(3,5,Bond::AROMATIC); // bond 1
mol->addBond(3,2,Bond::SINGLE); // bond 2
mol->addBond(2,1,Bond::SINGLE); // bond 3
mol->addBond(1,0,Bond::SINGLE); // bond 4
mol->addBond(0,2,Bond::SINGLE); // bond 5
// instead of calling sanitize mol, which would generate an error,
// we'll perceive the rings, then take care of aromatic bonds
// that aren't in a ring, then sanitize:
MolOps::findSSSR(*mol);
for(ROMol::BondIterator bondIt=mol->beginBonds();
bondIt!=mol->endBonds();++bondIt){
if( ((*bondIt)->getIsAromatic() ||
(*bondIt)->getBondType()==Bond::AROMATIC)
&& !mol->getRingInfo()->numBondRings((*bondIt)->getIdx()) ){
(*bondIt)->setIsAromatic(false);
// NOTE: this isn't really reasonable:
(*bondIt)->setBondType(Bond::SINGLE);
}
}
// now it's safe to sanitize:
RDKit::MolOps::sanitizeMol(*mol);
// Get the canonical SMILES, include stereochemistry:
std::string smiles;
smiles = MolToSmiles(*(static_cast<ROMol *>(mol)),true);
BOOST_LOG(rdInfoLog)<<" fixed SMILES: " <<smiles<<std::endl;
}
------------------
This takes a completely simple-minded approach and converts the
non-cyclic aromatic bonds to single bonds. This is probably wrong, but
hopefully this gives you enough info to get you started on a correct
solution.
> Other than that I'm happy to inform you that I have added the support
> for RDKIT to OSRA and the upcoming release will give users a choice of
> whether to compile with OpenBabel or RDKit as a molecular back-end.
Very cool! Thanks for doing this and letting me know.
-greg