My solution for the problem was the following:

qmol = Chem.MolFromMolBlock(molblock)
for atom in qmol.GetAtoms():
  if atom.HasQuery():
    continue
  atom.SetNumExplicitHs(atom.GetTotalNumHs())

This gives a SMARTS like
this: [#7]1(-[#6](-[#6H2]-[#6,#8]-[#6H](-[#6H2]-1)-[*])=[#8])-[*]

This may be good enough for this specific user, however It doesn't solve
the problem of the [C,O] query atom [#6,#8]. If that is C, it would allow
additional substitution of this atom.

How is your solution handling it?

Best,

Peter




On Tue, Jun 7, 2016 at 1:06 AM Brian Kelley <fustiga...@gmail.com> wrote:

> An interesting conversation came up at work a few days ago regarding
> MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
> tackling some of these issues when it comes to reaction processing .rxn
> based files and plan on contributing it relatively soon.  However, I hadn't
> considered making it a generic Query based sanitization/processing.
>
>
> The basic question was "How do I get a MolBlock to only match the "R"'s
> and not allow substitutions anywhere else? like ChemAxon..."
>
>
> As it turns out, RDKit is very strict when it looks at RGroups.  This was
> the initial issue with when i started Sanitizing RGroups.  Basically there
> are several variants in the wild (ChemDraw/ICM) that make reactions that
> don't quite follow the CTAB spec.  RDKit likes the atom labled R to (1)
> actually be in an "M  RGP" tag and (2) have an atom mapping.  If an atom is
> labeled "R" and not in a R_GRP it isn't considered a wild card for instance.
>
> Now queries don't really care about "M  RGP", but they do care that it
> isn't a dummy atom.  I'm listing below our current technique to fix these
> issues for CTAB queries and would like some feedback.
>
> Here is the workflow that we have been telling chemists during sketching:
>
> 1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough, you
> can replace it with "A", but R has special semantics and needs an RGroup
> label defined.
> 2. aromatize where appropriate
> 3. (optionally) protonate so only RGroups can match
>
> These line up with the following RDKit code snippets:
>
> 1. Fix the "R"s (note we probably should make proper RGroups, but this
> just add dummy matches)
>
> qmol = rdkit.Chem.MolFromMolblock(molblock)
> # first, change the "R"'s into matching any atoms
> from rdkit.Chem import rdqueries
> qmol = Chem.RWMol(qmol)
> for atom in newpat.GetAtoms():
>     if atom.GetAtomicNum() == 0:
>        qmol.ReplaceAtom(atom.GetIdx(),
> rdqueries.AtomNumGreaterQueryAtom(0))
>
>
> 2. aromatize - this might be good or might break things.  It seems to work
> great, even with conditional logic i.e. [C,O] but I'm unsure which atom is
> actually being used to form the Pi electrons for aromaticity checking.  I
> expect the First actually.  In anycase, something needs to happen in
> general for random inputs, otherwise the matching doesn't really do what is
> expected.
>
> # We want to see if we can find aromaticity, this may be complicated with
> #  query features [C,O] but it works ok.
> Chem.SanitizeMol(qmol, Chem.SANITIZE_SETAROMATICITY)
>
> 3. protonate if the desire is to only match RGroups
>
> # second, add explicit Hs so we only match the Rs
> # I'm unclear if this can fail in general, I would probably wrap this in
> #  a try...except block
> Chem.SanitizeMol(qmol, Chem.SANITIZE_ADJUSTHS)
> qmol = Chem.MergeQueryHs(Chem.AddHs(qmol))
>
> This could be enabled with flags into a SanitizeQuery function, or perhaps
> a PrepareQuery function.
>
> Thoughts?
>
> Cheers,
>  Brian
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to