Hi Tim, if you need access to the original text, you'll have to do the chunking yourself, e.g.:
import gzip def molgen(hnd): mol_text_tmp = "" while 1: line = hnd.readline() if not line: return line = line.decode("utf-8") mol_text_tmp += line if line.startswith("$$$$"): mol_text = mol_text_tmp mol_text_tmp = "" yield mol_text with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd: for mol_text in molgen(gzip_hnd): print(mol_text) suppl = Chem.SDMolSupplier() suppl.SetData(mol_text) mol = next(suppl) print(mol.GetNumAtoms()) print("------------------") If you are happy with the RDKit-generated text, you can combine the ForwardSDMolSupplier with the SDWriter: import gzip from io import StringIO with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd: with Chem.ForwardSDMolSupplier(gzip_hnd) as suppl: for mol in suppl: buf = StringIO() with Chem.SDWriter(buf) as w: w.write(mol) print(buf.getvalue()) print(mol.GetNumAtoms()) print("------------------") Cheers, p. On Thu, Nov 4, 2021 at 5:09 PM Tim Dudgeon <tdudgeon...@gmail.com> wrote: > I am needing to access the text of each record of a SDF, as well as > creating a mol instance. > I was successfully doing this using SDMolSupplier.GetItemText(). > Then I needed to switch to handling gzipped SD files, and SDMolSupplier > can only take a file name in its constructor. > ForwardSDMolSupplier can handle a gzip file-like instance, but doesn't > have the GetItemText() function. > Reading the file records as text is easy enough, but I can't figure out > how to get the SD file properties (Chem.MolFromMolBlock() does not handle > the properties). > > Seems like there should be an easy way to handle this that I'm not seeing! > > Tim > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss