I'd also be curious how the index is causing you problems. All SD reading
code that I know about ignores those suffixes. If you're not using RDKit to
read the SD file, maybe it would be best to update whatever it is you
*are *using
to parse the file.

dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke <da...@dalkescientific.com>
wrote:

> On Sep 26, 2023, at 01:17, Ling Chan <lingtrek...@gmail.com> wrote:
> > >  <pKa>  (1)
> > 4.0999999
>   ..
> > Just wonder what was the rationale behind this extra "(1)" on the
> property field lines (pKa and logP in the above example)?
> >
> > And is there a way to get rid of these? I am not sure if this extra
> "(1)" is part of the standard sd format.
>
> RDKit uses the increasing value as a sort of per-file registry number.
>
> This is follows the part of the standard which says "External registry
> numbers must be enclosed in parentheses."
>
> The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp :
>
>   if (d_molid >= 0) {
>     (*dp_ostream) << "(" << d_molid + 1 << ") ";
>   }
>
> There is no way to suppress this output. No only is there no direct way to
> change the d_molid, but d_molid cannot be negative as
> Code/GraphMol/FileParsers/MolWriters.h declares it as:
>
>   unsigned int d_molid;      // the number of the molecules we wrote so far
>
>
> Wim suggested a post-processing approach. Another is to write the SD data
> items yourself, that is, use MolToMolBlock() to generate the connection
> table/molfile as a string, then iterate through the properties and generate
> the data items.
>
>
> import sys
> from rdkit import Chem
>
> def MolToSDFRecord(
>         mol,
>         includeStereo: bool = True,
>         confId: int = -1,
>         kekulize: bool = True,
>         forceV3000: bool = False):
>     mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize,
> forceV3000)
>
>     lines = []
>     for prop_name in mol.GetPropNames():
>         if "\n" in prop_name or ">" in prop_name or "<" in prop_name:
>             sys.stderr.write(f"WARNING: Skipping property {prop_name!r}
> because the "
>                              "name includes an unsupported character.\n")
>             continue
>
>         prop_value = mol.GetProp(prop_name)
>         if "\n" in prop_value:
>             if "\n\n" in prop_value or "\r\n\r\n" in prop_value:
>                 sys.stderr.write(f"WARNING: Skipping property
> {prop_name!r} because the "
>                                  "value includes an embedded newline.\n")
>                 continue
>             if prop_value.endswith("\r\n"):
>                 prop_value = prop_value[:-2]
>             elif prop_value.endswith("\n"):
>                 prop_value = prop_value[:-1]
>
>         lines.append(f"> <{prop_name}>\n{prop_value}\n\n")
>
>     lines.append("$$$$\n")
>
>     return mol_block + "".join(lines)
>
> mol = Chem.MolFromSmiles("CCO")
> mol.SetProp("pKa","3.3\r\n")
> print(MolToSDFRecord(mol))
>
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to