In addition to Andrew's suggestions, I'd also recommend that you submit a bug report to the maker of your other tool! They probably want to know about this issue - I know I would if it's one of ours...
*dan nealschneider* | lead developer [image: Schrodinger Logo] <https://www.schrodinger.com/> On Fri, Oct 2, 2020 at 2:26 PM Andrew Dalke <da...@dalkescientific.com> wrote: > Hi Markus, > > > On Oct 2, 2020, at 19:56, Markus Metz <metm...@gmail.com> wrote: > > I have a question to the sd file format. > > When I write charged molecules via rdkit I noticed that the charge > definition in the atom block is not written. > > The charge is written at the end of the entry. > > So far this worked perfectly fine for me. > > > The ctfile documentation I have from 2011 says this of the charge > definition in the atom block: > > Wider range of values in M CHG and M RAD lines below. Retained > for compatibility with older Ctabs, M CHG and M RAD lines take > precedence. > > and > > With Ctab version V2000, the dd and ccc fields have been > superseded by the M ISO, M CHG, and M RAD lines in the properties > block, described below. For compatibility, all releases since ISIS 1.0: > > • Write appropriate values in both places if the values > are in the old range. > > • Use the atom block fields if there are no M ISO, M CHG, or > M RAD lines in the properties block. > > Support for these atom block fields might be removed in future > releases of Symyx software. > > Further, I looked into this when I wrote the blog post > http://www.dalkescientific.com/writings/diary/archive/2020/09/25/mixing_text_and_chemistry_toolkits.html > a couple of week ago, and found the 1992 JCICS paper "Description of > Several Chemical Structure File Formats Used by Computer Programs Developed > at Molecular Design Limited" by Dalby et al. has the "Wider range ... > Retained for compatibility with older Ctabs" in it. > > So including the charge in the atom block as well as in the properties > block is a 28+ year old backwards compatibility practice. > > > > Now, I am using a program which reads the atom block charge info only. > > Is there a way in rdkit to enable the charge written in the atom block? > > No. The code in Code/GraphMol/FileParsers/MolFileWriter.cpp has it > hard-coded to 0. > > > Do you have any thoughts on this? > > The two I can think of are: > - post-processing to add it back in, > - pass it through another toolkit which adds the duplicated charge > information > > > I've attached a program for the first of these options. The command-line > tools reads an SDF and generates a new SDF with the "M CHG" lines added to > the atom block. Here's the --help: > > =================== > usage: set_atom_block_charges.py [-h] [--output FILENAME] [--roundtrip] > [--verify] [--no-set] [FILENAME] > > copy charge information from the 'M CHG' data line to the atom block > > positional arguments: > FILENAME input filename (default: stdin) > > optional arguments: > -h, --help show this help message and exit > --output FILENAME, -o FILENAME > output file name (default: stdout) > --roundtrip use RDKit to parse the record and regenerate the > SDF record > --verify ensure the input and output SMILES match > --no-set don't set the charges (useful if you want to see > the round-trip output) > =================== > > This depends on the latest commercial version chemfp to identify records > in an SDF and to help with the verification. > > While chemfp is not open source, the base license lets you use this > functionality for in-house use. (See the file for installation details; the > pre-compiled package only installs on Linux-based OSes.) > > Or, you can grab set_atom_block_charges() from the code (and some code it > depends on) so you don't need chemfp at all. > > In the following, I round-trip the input through RDKit but don't set the > atom block charges: > > % python set_atom_block_charges.py piperidine.sdf --roundtrip --no-set > piperidine > RDKit 3D > > 6 6 0 0 1 0 0 0 0 0999 V2000 > -1.4650 0.7843 -0.9210 N 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0601 0.7265 -0.6801 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.6663 -0.3976 -1.5418 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.0188 -1.7539 -1.2886 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.5436 -1.6645 -1.4884 C 0 0 0 0 0 0 0 0 0 0 0 0 > -2.1760 -0.5554 -0.6261 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 > 1 6 1 0 > 2 3 1 0 > 3 4 1 0 > 4 5 1 0 > 5 6 1 0 > M CHG 1 1 1 > M END > $$$$ > > In the following, I round-trip it through RDKit then let the tool set the > charges in the atom block. > > % python set_atom_block_charges.py piperidine.sdf --roundtrip > piperidine > RDKit 3D > > 6 6 0 0 1 0 0 0 0 0999 V2000 > -1.4650 0.7843 -0.9210 N 0 3 0 0 0 0 0 0 0 0 0 0 > 0.0601 0.7265 -0.6801 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.6663 -0.3976 -1.5418 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.0188 -1.7539 -1.2886 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.5436 -1.6645 -1.4884 C 0 0 0 0 0 0 0 0 0 0 0 0 > -2.1760 -0.5554 -0.6261 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 > 1 6 1 0 > 2 3 1 0 > 3 4 1 0 > 4 5 1 0 > 5 6 1 0 > M CHG 1 1 1 > M END > $$$$ > > You can also use the --verify flag to generate and compare the SMILES > strings before and after the conversion. > > Best regards, > > > Andrew > da...@dalkescientific.com > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss