I'd also be curious how the index is causing you problems. All SD reading code that I know about ignores those suffixes. If you're not using RDKit to read the SD file, maybe it would be best to update whatever it is you *are *using to parse the file.
dan nealschneider | senior staff developer *he/him/his* [image: Schrödinger, Inc.] <https://schrodinger.com/> On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke <da...@dalkescientific.com> wrote: > On Sep 26, 2023, at 01:17, Ling Chan <lingtrek...@gmail.com> wrote: > > > <pKa> (1) > > 4.0999999 > .. > > Just wonder what was the rationale behind this extra "(1)" on the > property field lines (pKa and logP in the above example)? > > > > And is there a way to get rid of these? I am not sure if this extra > "(1)" is part of the standard sd format. > > RDKit uses the increasing value as a sort of per-file registry number. > > This is follows the part of the standard which says "External registry > numbers must be enclosed in parentheses." > > The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp : > > if (d_molid >= 0) { > (*dp_ostream) << "(" << d_molid + 1 << ") "; > } > > There is no way to suppress this output. No only is there no direct way to > change the d_molid, but d_molid cannot be negative as > Code/GraphMol/FileParsers/MolWriters.h declares it as: > > unsigned int d_molid; // the number of the molecules we wrote so far > > > Wim suggested a post-processing approach. Another is to write the SD data > items yourself, that is, use MolToMolBlock() to generate the connection > table/molfile as a string, then iterate through the properties and generate > the data items. > > > import sys > from rdkit import Chem > > def MolToSDFRecord( > mol, > includeStereo: bool = True, > confId: int = -1, > kekulize: bool = True, > forceV3000: bool = False): > mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, > forceV3000) > > lines = [] > for prop_name in mol.GetPropNames(): > if "\n" in prop_name or ">" in prop_name or "<" in prop_name: > sys.stderr.write(f"WARNING: Skipping property {prop_name!r} > because the " > "name includes an unsupported character.\n") > continue > > prop_value = mol.GetProp(prop_name) > if "\n" in prop_value: > if "\n\n" in prop_value or "\r\n\r\n" in prop_value: > sys.stderr.write(f"WARNING: Skipping property > {prop_name!r} because the " > "value includes an embedded newline.\n") > continue > if prop_value.endswith("\r\n"): > prop_value = prop_value[:-2] > elif prop_value.endswith("\n"): > prop_value = prop_value[:-1] > > lines.append(f"> <{prop_name}>\n{prop_value}\n\n") > > lines.append("$$$$\n") > > return mol_block + "".join(lines) > > mol = Chem.MolFromSmiles("CCO") > mol.SetProp("pKa","3.3\r\n") > print(MolToSDFRecord(mol)) > > > Andrew > da...@dalkescientific.com > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss