Hi Philipp,

It looks like the supplier thinks the line index has gone past the end of
file.
1) How large is the SMILES file which leads to this error (ls -l)?
2) Does it consistently happen at the same line number?
You can check this with something like:

suppl = Chem.SmilesMolSupplier(infile, sanitize=False, nameColumn=-1)
i = 0
while 1:
    try:
        mol = next(suppl)
    except StopIteration:
        break
    except Exception:
        print(f"Exception raised after {i} mols")
        raise
    i += 1

To check if the problem is actually due to file size, you may split
linewise your input file with the coreutils split command :

split -l <n_lines> large_file.smi large_file_ --additional-suffix=.smi

Replace <n_lines> with a number < than the one that causes the exception
and check if operating on smaller chunks removes the problem.

HTH, cheers
p.


On Tue, Jun 22, 2021 at 8:19 AM Philipp Otten <philipp.ott...@gmail.com>
wrote:

> Hey you lovely people,
> as I am creating a set of building blocks for my in-silico reaction, I
> downloaded various accessible databases (ChemBL28, GDB13, GDB17, Pubchem,
> emolecules and mcule) and want to just work through them with
> "HasSubstructMatch". Unfortunately I run into a "File parsing error: ran
> out of lines"
> I open the .smi files as SmilesMolSupplier and then just for loop through
> them:
>
>  with open(target_file, "w") as outfile:
>         suppl = Chem.SmilesMolSupplier(infile, sanitize=False,
> nameColumn=-1)
>         for mol in suppl:
>             if Descriptors.MolWt(mol) <= mwt:
>                 if mol.HasSubstructMatch(pattern1) == True:
>                     mol = Chem.MolToSmiles(mol)
>                     outfile.write(mol + "\n")
>                 else:
>                     continue
>             else:
>                 continue
>
> I can imagine that it possibly has something to do with the length of the
> files, but I don't know how to actually fix that.
> Thanks for all your help!
> Kind regards
> Philipp
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to