On Thu, May 24, 2012 at 3:38 AM, Andrew Dalke <da...@dalkescientific.com> wrote: > > I would like to save the fragment to an SD file. This seems > reasonable on the surface. But I don't know enough about the SD > file to know how meaningful that is. (For example, the above as > a 'fragment SMILES' is 'cc', which some programs accept and > many don't.) > > RDKit frowns on me trying to do that. > >>>> from rdkit import Chem >>>> mol = Chem.MolFromSmiles("c1ccccc1") >>>> emol = Chem.EditableMol(Chem.Mol()) >>>> bond = list(mol.GetBonds())[0] >>>> atoms = [bond.GetBeginAtom(), bond.GetEndAtom()] >>>> a1 = emol.AddAtom(atoms[0]) >>>> a2 = emol.AddAtom(atoms[1]) >>>> a1, a2 > (0, 1) >>>> emol.AddBond(a1, a2, bond.GetBondType()) > 1 >>>> mol2 = emol.GetMol() >>>> Chem.MolToMolBlock(mol2) > [02:47:11] non-ring atom 0 marked aromatic > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ValueError: Sanitization error: non-ring atom 0 marked aromatic >>>> > > > What I did to support this (again, I don't know if it's > really legitimate, according to the SD file conventions) > is to use "kekulize=False" > >>>> print Chem.MolToMolBlock(mol2, kekulize=False) > > RDKit > > 2 1 0 0 0 0 0 0 0 0999 V2000 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 4 0 > M END > > > It this reasonable for what I'm doing?
Absolutely. You're defining a query, not a molecule, so it's appropriate to use the aromatic bond type. As Gianluca points out in his message: the CTAB spec says that this bond type is intended for queries. > 2) I seem to have a problems with the code which extracts > the fragment from a molecule. Here's a reproducible, where > I think I turn carbon monoxide into a copy of itself, but > the copy can't be used to make a molecule block. > > > from rdkit import Chem > > def subgraph_to_fragment(mol, atom_indices, bond_indices): > emol = Chem.EditableMol(Chem.Mol()) > atom_map = {} > for atom_index in atom_indices: > emol.AddAtom(mol.GetAtomWithIdx(atom_index)) > atom_map[atom_index] = len(atom_map) > > for bond_index in bond_indices: > bond = mol.GetBondWithIdx(bond_index) > emol.AddBond(atom_map[bond.GetBeginAtomIdx()], > atom_map[bond.GetEndAtomIdx()], > bond.GetBondType()) > > return emol.GetMol() > > mol = Chem.MolFromSmiles("O=C") > bond_indices = [0] > ## bond = mol.GetBondWithIdx(0) > ## atom_indices = [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()] > atom_indices = [0, 1] > fragment = subgraph_to_fragment(mol, atom_indices, bond_indices) > print "====Original\n", Chem.MolToMolBlock(mol, kekulize=False) > print "SMILES", Chem.MolToSmiles(fragment) > print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False) > > > However, that gives the output and exception: > > ====Original > > RDKit > > 2 1 0 0 0 0 0 0 0 0999 V2000 > 0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 2 0 > M END > > SMILES C=O > ====Fragment > [03:12:30] > > **** > Pre-condition Violation > not initialized > Violation occurred on line 67 in file > /tmp/homebrew-rdkit-HEAD-dB4i/Code/GraphMol/RingInfo.cpp > Failed Expression: df_init > **** > > Traceback (most recent call last): > File "fail.py", line 26, in <module> > print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False) > RuntimeError: Pre-condition Violation > > > > What am I doing wrong here? Nothing really... it's a wart. The code to generate mol blocks needs ring information to figure out which bonds it should wedge to mark stereochemistry (if a choice is available, the writer favors wedging non-ring bonds). The error comes if you haven't run the ring-perception algorithm on your fragments. I'm about to check in a fix for this. I also checked in a fix that makes that error message at least a little bit more useful... it now says "RingInfo not initialized" instead of "not initialized" > > I know from private discussion with Greg on another topic that > a fix to this is either Chem.GetSSSR(fragment) or > Chem.FastFindRings(fragment), as in, for example: > > fragment = subgraph_to_fragment(mol, atom_indices, bond_indices) > Chem.FastFindRings(fragment) > > That seems to work for me. Why is it needed? And why *isn't* > it needed for the SMILES output? The SMILES writer doesn't need the ring information to do its work. > When building a molecule from scratch, I'm supposed to call > sanitizeMol on the result in order to perceive the chemistry. > However, I don't want to do that on a fragment for fear that > it will modify chemistry under the assumption that it's a > full molecule. Yeah, you definitely don't want full sanitization. A call to make fragments maximally useful would be: Chem.SanitizeMol(m,Chem.SanitizeFlags.SANITIZE_PROPERTIES|Chem.SanitizeFlags.SANITIZE_SYMMRINGS) This does ring perception and tallies up the valences of each atom, but shouldn't actually cause any structural changes. > > The only other previous match on Google to "Failed Expression: df_init" > is a bug report from JP, where the same exception is raised > after doing ReplaceSubstructs. Greg wrote: > > Marking this as "Won't Fix", because the current behavior is > what's supposed to happen: it is expected that the caller > will take care of sanitizing the results of ReplaceSubstructs > if they need them sanitized. The chemical reaction code behaves > similarly. > > > This seems like a rough edge that's caught a few people > and should be smoothed. Point taken. I'll look at it. -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss