Hi Mike,

I think the fact that you're seen an error message here at all is being
caused by a bug in the RDKit, which I'll take a look at.

The usual solution to problems reading molecules is to detect None, which
is what you're doing. I wouldn't expect that the script is actually dying
in the molecule parsing... are you sure that's what happens? Please include
the full traceback (what you show below is just an error message, there's
not actually an exception raised there).

Note that your script is extremely inefficient since it processes the
molecule multiple times. And since calling suppl[i] parses and creates a
new molecule every time, it's probably not doing what you think it is.
Here's another version (note tested) that should be much, much faster:


for i in range(lensuppl):
    mol = suppl[i]
    if mol is None:
        with open('.//test_out.csv', 'w', newline='') as csvfile:
            output = csv.writer(csvfile, delimiter=',', quoting =
csv.QUOTE_ALL)
            output.writerow(['Compound: ', i, ' failed.'])
    else:
        orig = Chem.MolToSmiles(mol)
        # Chem.rdmolops.SanitizeMol(suppl[i],268435455)
        smol = remover.StripMol(mol)
        y = Chem.MolToSmiles(smol)
        HAC = Chem.Lipinski.HeavyAtomCount(smol)
        HBD = Chem.Lipinski.NumHDonors(smol)
        HBA = Chem.Lipinski.NumHAcceptors(smol)
        molwt = Chem.Descriptors.MolWt(smol)
        nRotB = Chem.Lipinski.NumRotatableBonds(smol)
        nAr = Chem.Lipinski.NumAromaticRings(smol)
        InChi = Chem.MolToInchi(smol)
        TPSA = Chem.Descriptors.TPSA(smol)
        nChr = len(Chem.FindMolChiralCenters(smol,includeUnassigned=True))
        calc_values.append(str('{} {} {} {} {} {} {:.2f} {} {} {:.0f}
{}'.format(orig, y, InChi, HAC, HBD, HBA, molwt, nRotB, nAr, TPSA, nChr)))
        print(j, i)


I commented out the call to SanitizeMol(): that's not necessary since the
supplier itself has already sanitized the molecule.

Best,
-greg


On Sat, Oct 5, 2019 at 12:41 AM Mike Mazanetz <mi...@novadatasolutions.co.uk>
wrote:

> Hi RDKit gurus.
>
>
>
> I noted that compound 4 in the sdf, a poorly represented aryl system fails
> with this error.
>
>
>
> Any ideas on how to try/catch this error, or skip the molecule with an
> error, as the script dies painfully when it gets this far.  Snippet
> enclosed of the python call.
>
>
>
> Best – and thanks,
>
>
>
> mike
>
>
>
>
>
>
>
>
>
> def calc_SDF(file_name_list):
>
>   print("")
>
>   print("SDF Chunking Completed")
>
>   print("")
>
>
>
>   calc_values = []
>
>   calc_values.append(str("ORIG_SMI SMILES InChi HAC HBD HBA molWt nRotB
> nAr TPSA nChr"))
>
>
>
>   for j in range(len(file_name_list)):
>
>       assert os.path.isfile(file_name_list[j]), "File %s does not exist!"
> % file_name_list[j]
>
>
>
>       mols = file_name_list[j]
>
>       suppl = Chem.SDMolSupplier(mols)
>
>       lensuppl = len(suppl)
>
>       remover = SaltRemover()
>
>
>
>
>
>       for i in range(lensuppl):
>
>           if suppl[i] is None:
>
>               with open('.//test_out.csv', 'w', newline='') as csvfile:
>
>                   output = csv.writer(csvfile, delimiter=',', quoting =
> csv.QUOTE_ALL)
>
>                   output.writerow(['Compound: ', i, ' failed.'])
>
>           else:
>
>               orig = Chem.MolToSmiles(suppl[i])
>
>               Chem.rdmolops.SanitizeMol(suppl[i],268435455)
>
>               y = Chem.MolToSmiles(remover.StripMol(suppl[i]))
>
>               HAC = Chem.Lipinski.HeavyAtomCount(Chem.MolFromSmiles(y))
>
>               HBD = Chem.Lipinski.NumHDonors(Chem.MolFromSmiles(y))
>
>               HBA = Chem.Lipinski.NumHAcceptors(Chem.MolFromSmiles(y))
>
>               molwt = Chem.Descriptors.MolWt(Chem.MolFromSmiles(y))
>
>               nRotB =
> Chem.Lipinski.NumRotatableBonds(Chem.MolFromSmiles(y))
>
>               nAr = Chem.Lipinski.NumAromaticRings(Chem.MolFromSmiles(y))
>
>               InChi = Chem.MolToInchi(Chem.MolFromSmiles(y))
>
>               TPSA = Chem.Descriptors.TPSA(Chem.MolFromSmiles(y))
>
>               nChr =
> len(Chem.FindMolChiralCenters(Chem.MolFromSmiles(y),includeUnassigned=True))
>
>               calc_values.append(str('{} {} {} {} {} {} {:.2f} {} {}
> {:.0f} {}'.format(orig, y, InChi, HAC, HBD, HBA, molwt, nRotB, nAr, TPSA,
> nChr)))
>
>               print(j, i)
>
>
>
>   now = datetime.datetime.now()
>
>   print("")
>
>   print("Job ended at", now.strftime("%H:%M"), "on",
> now.strftime("%Y-%m-%d"))
>
>   log.close()
>
>   sys.stderr = save_std_err
>
>   return calc_values
>
>
>
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to