Thank you Andrew for the information. It is good to know that this is part of the standard. So I don't need to worry now. And I like the safety checking part of your code.
Dan, I wrote my email because from the SD file definition documents that I could find, I did not see any mention of this. I could have overlooked. But if it really is not part of the definition, it is always possible to encounter I/O problems. And we have encountered several similar situations with non-conformed files and non-conformed parsers. I had to check the format definition to determine which (writer or reader side) customer support to write to. This is why I am careful now. Updating the software you use would not solve it. It's not a bug as far as the parsing software is concerned. Ling On Fri., Sep. 29, 2023, 10:07 Dan Nealschneider, < dan.nealschnei...@schrodinger.com> wrote: > I'd also be curious how the index is causing you problems. All SD reading > code that I know about ignores those suffixes. If you're not using RDKit to > read the SD file, maybe it would be best to update whatever it is you *are > *using to parse the file. > > dan nealschneider | senior staff developer > > *he/him/his* > > [image: Schrödinger, Inc.] <https://schrodinger.com/> > > > On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke <da...@dalkescientific.com> > wrote: > >> On Sep 26, 2023, at 01:17, Ling Chan <lingtrek...@gmail.com> wrote: >> > > <pKa> (1) >> > 4.0999999 >> .. >> > Just wonder what was the rationale behind this extra "(1)" on the >> property field lines (pKa and logP in the above example)? >> > >> > And is there a way to get rid of these? I am not sure if this extra >> "(1)" is part of the standard sd format. >> >> RDKit uses the increasing value as a sort of per-file registry number. >> >> This is follows the part of the standard which says "External registry >> numbers must be enclosed in parentheses." >> >> The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp : >> >> if (d_molid >= 0) { >> (*dp_ostream) << "(" << d_molid + 1 << ") "; >> } >> >> There is no way to suppress this output. No only is there no direct way >> to change the d_molid, but d_molid cannot be negative as >> Code/GraphMol/FileParsers/MolWriters.h declares it as: >> >> unsigned int d_molid; // the number of the molecules we wrote so >> far >> >> >> Wim suggested a post-processing approach. Another is to write the SD data >> items yourself, that is, use MolToMolBlock() to generate the connection >> table/molfile as a string, then iterate through the properties and generate >> the data items. >> >> >> import sys >> from rdkit import Chem >> >> def MolToSDFRecord( >> mol, >> includeStereo: bool = True, >> confId: int = -1, >> kekulize: bool = True, >> forceV3000: bool = False): >> mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, >> forceV3000) >> >> lines = [] >> for prop_name in mol.GetPropNames(): >> if "\n" in prop_name or ">" in prop_name or "<" in prop_name: >> sys.stderr.write(f"WARNING: Skipping property {prop_name!r} >> because the " >> "name includes an unsupported character.\n") >> continue >> >> prop_value = mol.GetProp(prop_name) >> if "\n" in prop_value: >> if "\n\n" in prop_value or "\r\n\r\n" in prop_value: >> sys.stderr.write(f"WARNING: Skipping property >> {prop_name!r} because the " >> "value includes an embedded newline.\n") >> continue >> if prop_value.endswith("\r\n"): >> prop_value = prop_value[:-2] >> elif prop_value.endswith("\n"): >> prop_value = prop_value[:-1] >> >> lines.append(f"> <{prop_name}>\n{prop_value}\n\n") >> >> lines.append("$$$$\n") >> >> return mol_block + "".join(lines) >> >> mol = Chem.MolFromSmiles("CCO") >> mol.SetProp("pKa","3.3\r\n") >> print(MolToSDFRecord(mol)) >> >> >> Andrew >> da...@dalkescientific.com >> >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss