Thanks Paolo, that's fantastic.
The first option was what I needed.
Tim

On Thu, Nov 4, 2021 at 4:36 PM Paolo Tosco <paolo.tosco.m...@gmail.com>
wrote:

> Hi Tim,
>
> if you need access to the original text, you'll have to do the chunking
> yourself, e.g.:
>
> import gzip
>
> def molgen(hnd):
>     mol_text_tmp = ""
>     while 1:
>         line = hnd.readline()
>         if not line:
>             return
>         line = line.decode("utf-8")
>         mol_text_tmp += line
>         if line.startswith("$$$$"):
>             mol_text = mol_text_tmp
>             mol_text_tmp = ""
>             yield mol_text
>
> with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd:
>     for mol_text in molgen(gzip_hnd):
>         print(mol_text)
>         suppl = Chem.SDMolSupplier()
>         suppl.SetData(mol_text)
>         mol = next(suppl)
>         print(mol.GetNumAtoms())
>         print("------------------")
>
> If you are happy with the RDKit-generated text, you can combine the
> ForwardSDMolSupplier with the SDWriter:
>
> import gzip
> from io import StringIO
>
> with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd:
>     with Chem.ForwardSDMolSupplier(gzip_hnd) as suppl:
>         for mol in suppl:
>             buf = StringIO()
>             with Chem.SDWriter(buf) as w:
>                 w.write(mol)
>             print(buf.getvalue())
>             print(mol.GetNumAtoms())
>             print("------------------")
>
> Cheers,
> p.
>
> On Thu, Nov 4, 2021 at 5:09 PM Tim Dudgeon <tdudgeon...@gmail.com> wrote:
>
>> I am needing to access the text of each record of a SDF, as well as
>> creating a mol instance.
>> I was successfully doing this using SDMolSupplier.GetItemText().
>> Then I needed to switch to handling gzipped SD files, and SDMolSupplier
>> can only take a file name in its constructor.
>> ForwardSDMolSupplier can handle a gzip file-like instance, but doesn't
>> have the GetItemText() function.
>> Reading the file records as text is easy enough, but I can't figure out
>> how to get the SD file properties (Chem.MolFromMolBlock() does not handle
>> the properties).
>>
>> Seems like there should be an easy way to handle this that I'm not seeing!
>>
>> Tim
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to