Thank you Andrew for the information. It is good to know that this is part
of the standard. So I don't need to worry now. And I like the safety
checking part of your code.

Dan, I wrote my email because from the SD file definition documents that I
could find, I did not see any mention of this. I could have overlooked. But
if it really is not part of the definition,  it is always possible to
encounter I/O problems. And we have encountered several similar situations
with non-conformed files and non-conformed parsers. I had to check the
format definition to determine which (writer or reader side) customer
support to write to. This is why I am careful now. Updating the software
you use would not solve it. It's not a bug as far as the parsing software
is concerned.

Ling

On Fri., Sep. 29, 2023, 10:07 Dan Nealschneider, <
dan.nealschnei...@schrodinger.com> wrote:

> I'd also be curious how the index is causing you problems. All SD reading
> code that I know about ignores those suffixes. If you're not using RDKit to
> read the SD file, maybe it would be best to update whatever it is you *are
> *using to parse the file.
>
> dan nealschneider | senior staff developer
>
> *he/him/his*
>
> [image: Schrödinger, Inc.] <https://schrodinger.com/>
>
>
> On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke <da...@dalkescientific.com>
> wrote:
>
>> On Sep 26, 2023, at 01:17, Ling Chan <lingtrek...@gmail.com> wrote:
>> > >  <pKa>  (1)
>> > 4.0999999
>>   ..
>> > Just wonder what was the rationale behind this extra "(1)" on the
>> property field lines (pKa and logP in the above example)?
>> >
>> > And is there a way to get rid of these? I am not sure if this extra
>> "(1)" is part of the standard sd format.
>>
>> RDKit uses the increasing value as a sort of per-file registry number.
>>
>> This is follows the part of the standard which says "External registry
>> numbers must be enclosed in parentheses."
>>
>> The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp :
>>
>>   if (d_molid >= 0) {
>>     (*dp_ostream) << "(" << d_molid + 1 << ") ";
>>   }
>>
>> There is no way to suppress this output. No only is there no direct way
>> to change the d_molid, but d_molid cannot be negative as
>> Code/GraphMol/FileParsers/MolWriters.h declares it as:
>>
>>   unsigned int d_molid;      // the number of the molecules we wrote so
>> far
>>
>>
>> Wim suggested a post-processing approach. Another is to write the SD data
>> items yourself, that is, use MolToMolBlock() to generate the connection
>> table/molfile as a string, then iterate through the properties and generate
>> the data items.
>>
>>
>> import sys
>> from rdkit import Chem
>>
>> def MolToSDFRecord(
>>         mol,
>>         includeStereo: bool = True,
>>         confId: int = -1,
>>         kekulize: bool = True,
>>         forceV3000: bool = False):
>>     mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize,
>> forceV3000)
>>
>>     lines = []
>>     for prop_name in mol.GetPropNames():
>>         if "\n" in prop_name or ">" in prop_name or "<" in prop_name:
>>             sys.stderr.write(f"WARNING: Skipping property {prop_name!r}
>> because the "
>>                              "name includes an unsupported character.\n")
>>             continue
>>
>>         prop_value = mol.GetProp(prop_name)
>>         if "\n" in prop_value:
>>             if "\n\n" in prop_value or "\r\n\r\n" in prop_value:
>>                 sys.stderr.write(f"WARNING: Skipping property
>> {prop_name!r} because the "
>>                                  "value includes an embedded newline.\n")
>>                 continue
>>             if prop_value.endswith("\r\n"):
>>                 prop_value = prop_value[:-2]
>>             elif prop_value.endswith("\n"):
>>                 prop_value = prop_value[:-1]
>>
>>         lines.append(f"> <{prop_name}>\n{prop_value}\n\n")
>>
>>     lines.append("$$$$\n")
>>
>>     return mol_block + "".join(lines)
>>
>> mol = Chem.MolFromSmiles("CCO")
>> mol.SetProp("pKa","3.3\r\n")
>> print(MolToSDFRecord(mol))
>>
>>
>>                                 Andrew
>>                                 da...@dalkescientific.com
>>
>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to