On Apr 16, 2018, at 05:37, Patrick Walters <wpwalt...@gmail.com> wrote:
> Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so I 
> wrote something to work directly on a molecule. 

That's the approach I started with, until I figured out that it doesn't 
preserve chirality.

If I change the end of your code to:

from mmpdblib import smiles_syntax

def weld_dalke(core, r_groups):
    s1 = smiles_syntax.convert_labeled_wildcards_to_closures(core)
    s2 = smiles_syntax.convert_labeled_wildcards_to_closures(r_groups)
    return Chem.CanonSmiles(s1+"."+s2)

if __name__ == "__main__":
    mol_to_weld = Chem.MolFromSmiles(
    welded_mol = weld_r_groups(mol_to_weld)
    print("Expected  :", Chem.CanonSmiles("N[C@](F)(Cl)O"))
    print("Direct    :", Chem.MolToSmiles(welded_mol, isomericSmiles=True))
    print("Via SMILES:", weld_dalke("[*:1][C@](F)(Cl)O", "N[*:1]"))

These should print identical SMILES strings, but instead give:

Expected  : N[C@](O)(F)Cl
Direct    : N[C@@](O)(F)Cl
Via SMILES: N[C@](O)(F)Cl

If chirality preservation isn't a concern, then there's no problem.

BTW, your current code assumes there will only be one attachment point on an 
atom. For example, the input
create the output

It's not hard to fix, and I think more of a d'oh! issue.

In a quick benchmark I put together just now, I found that my SMILES syntax 
manipulation approach was about twice as fast to turn the two core/R-group 
SMILES strings into a molecule.



Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to