Hi Lewis,

Dealing with all the strange chemical representations that show up "in the
wild" is an ongoing struggle.

Your first example is pretty clearly intended to be an azide and we can
certainly add a rule to normalize that one to what the RDKit expects it to
be (there already is a rule for C-N=N#N, but that doesn't help here.). That
won't happen before the next feature release though.

I'm not really sure what the intent was for the two
four-coordinate neutral Ns in the second molecule, so I think it's unlikely
that we'd add a standard cleanup for one.

However! The good news is that there's a pretty easy (and efficient) way to
fix this yourself. We added a new method to chemical reactions in the
2021.09 release which allows you to modify a molecule in place (subject to
some constraints). This is ideal for doing cleanup transformations like
these.

This gist shows how to write reaction rules for your cases (I guessed for
what the Ns are supposed to be) and then use them:
https://gist.github.com/greglandrum/8fd229bc6bf6c734d1c21da7f2bebebb

Hope this helps,
-greg


On Wed, Dec 15, 2021 at 12:21 AM Lewis Martin <lewis.marti...@gmail.com>
wrote:

> Hi All,
> Reading molecules from a bulk download of SureChEMBL, I come across a fair
> few molecules that fail to parse. Not sure whether they SHOULD parse or
> not.
>
> Here is an example: https://www.surechembl.org/chemical/SCHEMBL386
> with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1
>
> Even reading the SMILES code one can see that there are too many bonds in
> there - a nitrogen triply bonded and doubly bonded to other atoms.
>
> Another example: https://www.surechembl.org/chemical/SCHEMBL33957
> smiles: NC(N)=[NH]C1=NC(CSCC[NH]=CNS(=O)(=O)C2=CC=C(Br)C=C2)=CS1
>
> Again, valence for a nitrogen is off.
>
> Should I expect to parse these with RDKit? Might there be some way around
> this? It's a significant fraction of the molecules in SureChEMBL.
>
> Thanks team!
> Lewis
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to