On Sep 26, 2023, at 01:17, Ling Chan <lingtrek...@gmail.com> wrote:
> >  <pKa>  (1) 
> 4.0999999
  ..
> Just wonder what was the rationale behind this extra "(1)" on the property 
> field lines (pKa and logP in the above example)?
> 
> And is there a way to get rid of these? I am not sure if this extra "(1)" is 
> part of the standard sd format.

RDKit uses the increasing value as a sort of per-file registry number.

This is follows the part of the standard which says "External registry numbers 
must be enclosed in parentheses."

The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp :

  if (d_molid >= 0) {
    (*dp_ostream) << "(" << d_molid + 1 << ") ";
  }

There is no way to suppress this output. No only is there no direct way to 
change the d_molid, but d_molid cannot be negative as 
Code/GraphMol/FileParsers/MolWriters.h declares it as:

  unsigned int d_molid;      // the number of the molecules we wrote so far


Wim suggested a post-processing approach. Another is to write the SD data items 
yourself, that is, use MolToMolBlock() to generate the connection table/molfile 
as a string, then iterate through the properties and generate the data items.


import sys
from rdkit import Chem

def MolToSDFRecord(
        mol,
        includeStereo: bool = True,
        confId: int = -1,
        kekulize: bool = True,
        forceV3000: bool = False):
    mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, 
forceV3000)
    
    lines = []
    for prop_name in mol.GetPropNames():
        if "\n" in prop_name or ">" in prop_name or "<" in prop_name:
            sys.stderr.write(f"WARNING: Skipping property {prop_name!r} because 
the "
                             "name includes an unsupported character.\n")
            continue
        
        prop_value = mol.GetProp(prop_name)
        if "\n" in prop_value:
            if "\n\n" in prop_value or "\r\n\r\n" in prop_value:
                sys.stderr.write(f"WARNING: Skipping property {prop_name!r} 
because the "
                                 "value includes an embedded newline.\n")
                continue
            if prop_value.endswith("\r\n"):
                prop_value = prop_value[:-2]
            elif prop_value.endswith("\n"):
                prop_value = prop_value[:-1]
        
        lines.append(f"> <{prop_name}>\n{prop_value}\n\n")
    
    lines.append("$$$$\n")
    
    return mol_block + "".join(lines)

mol = Chem.MolFromSmiles("CCO")
mol.SetProp("pKa","3.3\r\n")
print(MolToSDFRecord(mol))


                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to