Hi All, Reading molecules from a bulk download of SureChEMBL, I come across a fair few molecules that fail to parse. Not sure whether they SHOULD parse or not.
Here is an example: https://www.surechembl.org/chemical/SCHEMBL386 with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1 Even reading the SMILES code one can see that there are too many bonds in there - a nitrogen triply bonded and doubly bonded to other atoms. Another example: https://www.surechembl.org/chemical/SCHEMBL33957 smiles: NC(N)=[NH]C1=NC(CSCC[NH]=CNS(=O)(=O)C2=CC=C(Br)C=C2)=CS1 Again, valence for a nitrogen is off. Should I expect to parse these with RDKit? Might there be some way around this? It's a significant fraction of the molecules in SureChEMBL. Thanks team! Lewis
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

