And while we're at it we should think about including options like what
ChemAxon calls "Vague Bond Search" (
https://docs.chemaxon.com/pages/viewpage.action?pageId=22217121#Tautomersearch/Vaguebondsearch/sp-Hybridization-vaguebond).
This would help address some of the aromaticity problems.

On Tue, Jun 7, 2016 at 5:30 AM, Greg Landrum <greg.land...@gmail.com> wrote:

> I think that here it's worth, at least initially, ignoring what is
> currently possible with the RDKit (and how that's implemented) and instead
> thinking about what we want to be able to do.[1]
>
> The goal, I think, is to have some options allowing control over how a
> query coming from a MOL block/CTAB actually matches target molecules. One
> possible model for this would be to look at the options that were available
> for searching in systems like ISIS/Host and ISIS/Base (and whatever it is
> that they are now called). I no longer have access to those, but I would
> guess that someone in the community may or that some googling will turn up
> documentation describing/showing the options. I remember there being
> options like: "search as drawn", "allow/disallow substitution at
> heteroatoms", "allow substitution everywhere", etc. This may be a good
> starting point, then we can think about what kind of options we want to add
> for interpreting "R" groups or Hs that have been explicitly added to the
> drawing.
>
> Does the thought make sense to you guys? Does anyone have access
> to/remember better what those search options are?
>
> -greg
> [1] all the while keeping somewhere in mind that the core of the RDKit is
> really using a more "Daylight-like" model and that there is almost
> certainly going to be some mismatch with the MDL model... but we'll worry
> about that when we get there.
>
>
>
> On Mon, Jun 6, 2016 at 7:04 PM, Brian Kelley <fustiga...@gmail.com> wrote:
>
>> An interesting conversation came up at work a few days ago regarding
>> MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
>> tackling some of these issues when it comes to reaction processing .rxn
>> based files and plan on contributing it relatively soon.  However, I hadn't
>> considered making it a generic Query based sanitization/processing.
>>
>>
>> The basic question was "How do I get a MolBlock to only match the "R"'s
>> and not allow substitutions anywhere else? like ChemAxon..."
>>
>>
>> As it turns out, RDKit is very strict when it looks at RGroups.  This was
>> the initial issue with when i started Sanitizing RGroups.  Basically there
>> are several variants in the wild (ChemDraw/ICM) that make reactions that
>> don't quite follow the CTAB spec.  RDKit likes the atom labled R to (1)
>> actually be in an "M  RGP" tag and (2) have an atom mapping.  If an atom is
>> labeled "R" and not in a R_GRP it isn't considered a wild card for instance.
>>
>> Now queries don't really care about "M  RGP", but they do care that it
>> isn't a dummy atom.  I'm listing below our current technique to fix these
>> issues for CTAB queries and would like some feedback.
>>
>> Here is the workflow that we have been telling chemists during sketching:
>>
>> 1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough,
>> you can replace it with "A", but R has special semantics and needs an
>> RGroup label defined.
>> 2. aromatize where appropriate
>> 3. (optionally) protonate so only RGroups can match
>>
>> These line up with the following RDKit code snippets:
>>
>> 1. Fix the "R"s (note we probably should make proper RGroups, but this
>> just add dummy matches)
>>
>> qmol = rdkit.Chem.MolFromMolblock(molblock)
>> # first, change the "R"'s into matching any atoms
>> from rdkit.Chem import rdqueries
>> qmol = Chem.RWMol(qmol)
>> for atom in newpat.GetAtoms():
>>     if atom.GetAtomicNum() == 0:
>>        qmol.ReplaceAtom(atom.GetIdx(),
>> rdqueries.AtomNumGreaterQueryAtom(0))
>>
>>
>> 2. aromatize - this might be good or might break things.  It seems to
>> work great, even with conditional logic i.e. [C,O] but I'm unsure which
>> atom is actually being used to form the Pi electrons for aromaticity
>> checking.  I expect the First actually.  In anycase, something needs to
>> happen in general for random inputs, otherwise the matching doesn't really
>> do what is expected.
>>
>> # We want to see if we can find aromaticity, this may be complicated with
>> #  query features [C,O] but it works ok.
>> Chem.SanitizeMol(qmol, Chem.SANITIZE_SETAROMATICITY)
>>
>> 3. protonate if the desire is to only match RGroups
>>
>> # second, add explicit Hs so we only match the Rs
>> # I'm unclear if this can fail in general, I would probably wrap this in
>> #  a try...except block
>> Chem.SanitizeMol(qmol, Chem.SANITIZE_ADJUSTHS)
>> qmol = Chem.MergeQueryHs(Chem.AddHs(qmol))
>>
>> This could be enabled with flags into a SanitizeQuery function, or
>> perhaps a PrepareQuery function.
>>
>> Thoughts?
>>
>> Cheers,
>>  Brian
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning reports.
>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to