On Apr 16, 2018, at 05:37, Patrick Walters <[email protected]> wrote:
>
> Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so I
> wrote something to work directly on a molecule.
That's the approach I started with, until I figured out that it doesn't
preserve chirality.
If I change the end of your code to:
==========
from mmpdblib import smiles_syntax
def weld_dalke(core, r_groups):
s1 = smiles_syntax.convert_labeled_wildcards_to_closures(core)
s2 = smiles_syntax.convert_labeled_wildcards_to_closures(r_groups)
return Chem.CanonSmiles(s1+"."+s2)
if __name__ == "__main__":
mol_to_weld = Chem.MolFromSmiles(
"[*:1][C@](F)(Cl)O.N[*:1]")
welded_mol = weld_r_groups(mol_to_weld)
print("Expected :", Chem.CanonSmiles("N[C@](F)(Cl)O"))
print("Direct :", Chem.MolToSmiles(welded_mol, isomericSmiles=True))
print("Via SMILES:", weld_dalke("[*:1][C@](F)(Cl)O", "N[*:1]"))
==========
These should print identical SMILES strings, but instead give:
Expected : N[C@](O)(F)Cl
Direct : N[C@@](O)(F)Cl
Via SMILES: N[C@](O)(F)Cl
If chirality preservation isn't a concern, then there's no problem.
BTW, your current code assumes there will only be one attachment point on an
atom. For example, the input
[*:1][C@]([*:2])(Cl)O.N[*:1].F[*:2]
create the output
N.O[C](F)Cl
It's not hard to fix, and I think more of a d'oh! issue.
In a quick benchmark I put together just now, I found that my SMILES syntax
manipulation approach was about twice as fast to turn the two core/R-group
SMILES strings into a molecule.
Cheers,
Andrew
[email protected]
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss