On Nov 9, 2017, at 21:49, Brian Cole <col...@gmail.com> wrote:
> Certainly, but thousands of lines of Python doesn't fit in an email in an
> easily digestible way. :-)
I'll restate things since I wasn't clear. While this step may be what you need
for the way you structure things, there might be a better way to structure
things. I can't tell because I don't know what it is you are trying to do.
> The reason I need to drop into a real RDKit molecule is because I want to be
> able to attach to any implicit hydrogen for my application. I couldn't think
> of an easy regular expression that located an atom block with one or more
> implicit hydrogens.
There isn't one. That requires at least a context-free grammar because it needs
to count the valence used by branches, and branches can be arbitrarily nested.
I think your "any implicit hydrogen" will have problems when the implicit
hydrogen count is specified in square brackets, as with a chiral hydrogen, or
an atom outside of the organic subset, or one with another property specified
(e.g., isotopes or charge).
Leaving the tricky chiral hydrogen aside, you're turning:
where the silicon has an implicit hydrogen count of 2 and a valence of 4, into
where the silicon is now 5-valent. Similarly,
If you have some way to annotate which atoms have at least one implicit
hydrogen then you can use the regular expression from my last email, and if it
uses s then reach in and reduce the H count by 1 as part of the
You'll still need some special code to deal with chiral hydrogens.
BTW, I don't think you need closures for this at all. You have a set of
fragments, where you know which atom will be attached, and I believe you
control the ordering of the atoms in that fragment.
If you use MolToSmiles(rootedAtAtom) so that the attachment atom is always
first, e.g. place the "O" in the phenol first:
then you can attach it to the core at a given point using a branch, e.g., to
attach it to the 4th atom of:
[C@]1(C)CCO1 -> [C@]1(C)CC(Oc1ccccc1)O1
This is the 4th regular expression match (to find the atom, and skip any ring
closures), followed by '(' followed by the rooted fragment followed by ')'
followed by the rest of the original string. Plus some modification of the
regular expression match itself to reduce the H count, if the implicit H-count
is stated explicitly. Assuming the core attachment point has at least one
implicit, non-chiral hydrogen on it.
It feels slightly less tricky than the ring closure solution, though still
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list