Hi All,
Reading molecules from a bulk download of SureChEMBL, I come across a fair
few molecules that fail to parse. Not sure whether they SHOULD parse or

Here is an example: https://www.surechembl.org/chemical/SCHEMBL386
with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1

Even reading the SMILES code one can see that there are too many bonds in
there - a nitrogen triply bonded and doubly bonded to other atoms.

Another example: https://www.surechembl.org/chemical/SCHEMBL33957
smiles: NC(N)=[NH]C1=NC(CSCC[NH]=CNS(=O)(=O)C2=CC=C(Br)C=C2)=CS1

Again, valence for a nitrogen is off.

Should I expect to parse these with RDKit? Might there be some way around
this? It's a significant fraction of the molecules in SureChEMBL.

Thanks team!
Rdkit-discuss mailing list

Reply via email to