Hi Mike, I think the fact that you're seen an error message here at all is being caused by a bug in the RDKit, which I'll take a look at.
The usual solution to problems reading molecules is to detect None, which is what you're doing. I wouldn't expect that the script is actually dying in the molecule parsing... are you sure that's what happens? Please include the full traceback (what you show below is just an error message, there's not actually an exception raised there). Note that your script is extremely inefficient since it processes the molecule multiple times. And since calling suppl[i] parses and creates a new molecule every time, it's probably not doing what you think it is. Here's another version (note tested) that should be much, much faster: for i in range(lensuppl): mol = suppl[i] if mol is None: with open('.//test_out.csv', 'w', newline='') as csvfile: output = csv.writer(csvfile, delimiter=',', quoting = csv.QUOTE_ALL) output.writerow(['Compound: ', i, ' failed.']) else: orig = Chem.MolToSmiles(mol) # Chem.rdmolops.SanitizeMol(suppl[i],268435455) smol = remover.StripMol(mol) y = Chem.MolToSmiles(smol) HAC = Chem.Lipinski.HeavyAtomCount(smol) HBD = Chem.Lipinski.NumHDonors(smol) HBA = Chem.Lipinski.NumHAcceptors(smol) molwt = Chem.Descriptors.MolWt(smol) nRotB = Chem.Lipinski.NumRotatableBonds(smol) nAr = Chem.Lipinski.NumAromaticRings(smol) InChi = Chem.MolToInchi(smol) TPSA = Chem.Descriptors.TPSA(smol) nChr = len(Chem.FindMolChiralCenters(smol,includeUnassigned=True)) calc_values.append(str('{} {} {} {} {} {} {:.2f} {} {} {:.0f} {}'.format(orig, y, InChi, HAC, HBD, HBA, molwt, nRotB, nAr, TPSA, nChr))) print(j, i) I commented out the call to SanitizeMol(): that's not necessary since the supplier itself has already sanitized the molecule. Best, -greg On Sat, Oct 5, 2019 at 12:41 AM Mike Mazanetz <mi...@novadatasolutions.co.uk> wrote: > Hi RDKit gurus. > > > > I noted that compound 4 in the sdf, a poorly represented aryl system fails > with this error. > > > > Any ideas on how to try/catch this error, or skip the molecule with an > error, as the script dies painfully when it gets this far. Snippet > enclosed of the python call. > > > > Best – and thanks, > > > > mike > > > > > > > > > > def calc_SDF(file_name_list): > > print("") > > print("SDF Chunking Completed") > > print("") > > > > calc_values = [] > > calc_values.append(str("ORIG_SMI SMILES InChi HAC HBD HBA molWt nRotB > nAr TPSA nChr")) > > > > for j in range(len(file_name_list)): > > assert os.path.isfile(file_name_list[j]), "File %s does not exist!" > % file_name_list[j] > > > > mols = file_name_list[j] > > suppl = Chem.SDMolSupplier(mols) > > lensuppl = len(suppl) > > remover = SaltRemover() > > > > > > for i in range(lensuppl): > > if suppl[i] is None: > > with open('.//test_out.csv', 'w', newline='') as csvfile: > > output = csv.writer(csvfile, delimiter=',', quoting = > csv.QUOTE_ALL) > > output.writerow(['Compound: ', i, ' failed.']) > > else: > > orig = Chem.MolToSmiles(suppl[i]) > > Chem.rdmolops.SanitizeMol(suppl[i],268435455) > > y = Chem.MolToSmiles(remover.StripMol(suppl[i])) > > HAC = Chem.Lipinski.HeavyAtomCount(Chem.MolFromSmiles(y)) > > HBD = Chem.Lipinski.NumHDonors(Chem.MolFromSmiles(y)) > > HBA = Chem.Lipinski.NumHAcceptors(Chem.MolFromSmiles(y)) > > molwt = Chem.Descriptors.MolWt(Chem.MolFromSmiles(y)) > > nRotB = > Chem.Lipinski.NumRotatableBonds(Chem.MolFromSmiles(y)) > > nAr = Chem.Lipinski.NumAromaticRings(Chem.MolFromSmiles(y)) > > InChi = Chem.MolToInchi(Chem.MolFromSmiles(y)) > > TPSA = Chem.Descriptors.TPSA(Chem.MolFromSmiles(y)) > > nChr = > len(Chem.FindMolChiralCenters(Chem.MolFromSmiles(y),includeUnassigned=True)) > > calc_values.append(str('{} {} {} {} {} {} {:.2f} {} {} > {:.0f} {}'.format(orig, y, InChi, HAC, HBD, HBA, molwt, nRotB, nAr, TPSA, > nChr))) > > print(j, i) > > > > now = datetime.datetime.now() > > print("") > > print("Job ended at", now.strftime("%H:%M"), "on", > now.strftime("%Y-%m-%d")) > > log.close() > > sys.stderr = save_std_err > > return calc_values > > > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss