Hi Philipp, It looks like the supplier thinks the line index has gone past the end of file. 1) How large is the SMILES file which leads to this error (ls -l)? 2) Does it consistently happen at the same line number? You can check this with something like:
suppl = Chem.SmilesMolSupplier(infile, sanitize=False, nameColumn=-1) i = 0 while 1: try: mol = next(suppl) except StopIteration: break except Exception: print(f"Exception raised after {i} mols") raise i += 1 To check if the problem is actually due to file size, you may split linewise your input file with the coreutils split command : split -l <n_lines> large_file.smi large_file_ --additional-suffix=.smi Replace <n_lines> with a number < than the one that causes the exception and check if operating on smaller chunks removes the problem. HTH, cheers p. On Tue, Jun 22, 2021 at 8:19 AM Philipp Otten <philipp.ott...@gmail.com> wrote: > Hey you lovely people, > as I am creating a set of building blocks for my in-silico reaction, I > downloaded various accessible databases (ChemBL28, GDB13, GDB17, Pubchem, > emolecules and mcule) and want to just work through them with > "HasSubstructMatch". Unfortunately I run into a "File parsing error: ran > out of lines" > I open the .smi files as SmilesMolSupplier and then just for loop through > them: > > with open(target_file, "w") as outfile: > suppl = Chem.SmilesMolSupplier(infile, sanitize=False, > nameColumn=-1) > for mol in suppl: > if Descriptors.MolWt(mol) <= mwt: > if mol.HasSubstructMatch(pattern1) == True: > mol = Chem.MolToSmiles(mol) > outfile.write(mol + "\n") > else: > continue > else: > continue > > I can imagine that it possibly has something to do with the length of the > files, but I don't know how to actually fix that. > Thanks for all your help! > Kind regards > Philipp > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss