Hi Markus,

What's happening here is that you are ending up with some aromatic
heteroatoms (likely nitrogens in this case) that are losing an attached
atom without having their explicit valence increased - it's the same
problem that leads to the kekulization failure when you try to build a
molecule from the smiles 'c1cccn1'.

The easiest solution is to replace the dummy atoms with Hs. So if you
replace your dummy-atom removal script with something like this it should
work:

------------------------------------------------------------
-----------------------------------------------------------
import re
resultsList = pd.DataFrame()

with open('my_csv.csv', 'a') as f:

    for smi in df5['ROMol']:
        smi = Chem.MolToSmiles(smi,isomericSmiles=True)  # do
isomericSmiles ot make sure chirality doesn't get lost
        smi = re.sub(r"(\[[0-9]*\*\])", "[H]", smi)
        pattern = Chem.MolFromSmiles(smi)
------------------------------------------------------------
-----------------------------------------------------------

The Hs will be removed when the molecule is built from SMILES.

An aside: I believe that it's a lot more efficient (and somewhat easier to
read) to filter your data frames in one step. So you could replace:

df3 = df2[df2['HeavyAtoms']>6]
df4 = df3[df3['RingAroms'] > 0]
df5 = df4[df4['NumRings'] > 1]

with:

df5 = df2[(df2.NumRings>1) & (df2.RingAroms>0) & (df2.HeavyAtoms>6)]

Best,
-greg



On Thu, Aug 4, 2016 at 10:23 PM, Markus Metz <[email protected]> wrote:

> Hello all:
>
> I am trying to use the brics algorithm to fragment my compounds, filter
> the fragments and try to group the original compounds by selected fragments.
>
> As test I used the cdk2 data set provided by rdkit.
>
> Here is a sample code partly cannibalizing Greg's and others' example
> code:
>
>
> This part creates and displays the fragments:
> ------------------------------------------------------------
> -----------------------------------------------------------
> from rdkit.Chem import BRICS
>
> df = PandasTools.LoadSDF('cdk2.sdf')
> df.describe()
>
> allfrags=set()
>
> for i,rows in df.iterrows():
>     mol = rows['ROMol']
>     pieces = BRICS.BRICSDecompose(mol)
>     allfrags.update(pieces)
>
> from rdkit.Chem import Descriptors
> from rdkit.Chem import rdMolDescriptors
>
> fragList = list(allfrags)
> df1 = pd.Series(fragList)
> df2 = df1.to_frame()
> df2.columns = ['smiles']
> PandasTools.AddMoleculeColumnToFrame(df2,smilesCol='smiles',
> molCol='ROMol')
>
> df2['NumRings'] = df2['ROMol'].map(rdMolDescriptors.CalcNumRings)
> df2['RingAroms'] = df2['ROMol'].apply(lambda x:
> Descriptors.NumAromaticRings(x))
> df2['HeavyAtoms'] = df2['ROMol'].apply(lambda x:
> Descriptors.HeavyAtomCount(x))
>
> df3 = df2[df2['HeavyAtoms']>6]
> df4 = df3[df3['RingAroms'] > 0]
> df5 = df4[df4['NumRings'] > 1]
>
> PandasTools.FrameToGridImage(df5, column='ROMol')
>
>
>
> This part removes the dummy atoms from smiles and tries to regenerate mol
> objects:
> ------------------------------------------------------------
> -----------------------------------------------------------
> import re
> resultsList = pd.DataFrame()
>
> with open('my_csv.csv', 'a') as f:
>
>     for smi in df5['ROMol']:
>         smi = Chem.MolToSmiles(smi)
>         smi = re.sub(r"(\(\[\*\]\))", "", smi)
>         smi = re.sub(r"(\[\*\])", "", smi)
>
>         pattern = Chem.MolFromSmiles(smi)
> ------------------------------------------------------------
> -----------------------------------------------------------
>
> This throws me here an error saying:
> RDKIT Error: Can't kekulize mol
>
> Do you know what is going on?
>
> Many thanks in advance,
> Markus
>
> ------------------------------------------------------------
> ------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to