On Sep 26, 2023, at 01:17, Ling Chan <lingtrek...@gmail.com> wrote: > > <pKa> (1) > 4.0999999 .. > Just wonder what was the rationale behind this extra "(1)" on the property > field lines (pKa and logP in the above example)? > > And is there a way to get rid of these? I am not sure if this extra "(1)" is > part of the standard sd format.
RDKit uses the increasing value as a sort of per-file registry number. This is follows the part of the standard which says "External registry numbers must be enclosed in parentheses." The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp : if (d_molid >= 0) { (*dp_ostream) << "(" << d_molid + 1 << ") "; } There is no way to suppress this output. No only is there no direct way to change the d_molid, but d_molid cannot be negative as Code/GraphMol/FileParsers/MolWriters.h declares it as: unsigned int d_molid; // the number of the molecules we wrote so far Wim suggested a post-processing approach. Another is to write the SD data items yourself, that is, use MolToMolBlock() to generate the connection table/molfile as a string, then iterate through the properties and generate the data items. import sys from rdkit import Chem def MolToSDFRecord( mol, includeStereo: bool = True, confId: int = -1, kekulize: bool = True, forceV3000: bool = False): mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, forceV3000) lines = [] for prop_name in mol.GetPropNames(): if "\n" in prop_name or ">" in prop_name or "<" in prop_name: sys.stderr.write(f"WARNING: Skipping property {prop_name!r} because the " "name includes an unsupported character.\n") continue prop_value = mol.GetProp(prop_name) if "\n" in prop_value: if "\n\n" in prop_value or "\r\n\r\n" in prop_value: sys.stderr.write(f"WARNING: Skipping property {prop_name!r} because the " "value includes an embedded newline.\n") continue if prop_value.endswith("\r\n"): prop_value = prop_value[:-2] elif prop_value.endswith("\n"): prop_value = prop_value[:-1] lines.append(f"> <{prop_name}>\n{prop_value}\n\n") lines.append("$$$$\n") return mol_block + "".join(lines) mol = Chem.MolFromSmiles("CCO") mol.SetProp("pKa","3.3\r\n") print(MolToSDFRecord(mol)) Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss