I also like the term "FuzzyBonds" better than vague bonds if we get to
rename it :)

Cheers,
 Brian

On Tue, Jun 7, 2016 at 3:21 PM, Brian Kelley <fustiga...@gmail.com> wrote:

> I was also thinking that instead of protonating, we could just "and" with
> a heavy degree query with the degree equal to the current degree.  This
> should have the same effect, correct?
>
> Cheers,
>  Brian
>
> On Tue, Jun 7, 2016 at 12:37 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> And while we're at it we should think about including options like what
>> ChemAxon calls "Vague Bond Search" (
>> https://docs.chemaxon.com/pages/viewpage.action?pageId=22217121#Tautomersearch/Vaguebondsearch/sp-Hybridization-vaguebond).
>> This would help address some of the aromaticity problems.
>>
>> On Tue, Jun 7, 2016 at 5:30 AM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> I think that here it's worth, at least initially, ignoring what is
>>> currently possible with the RDKit (and how that's implemented) and instead
>>> thinking about what we want to be able to do.[1]
>>>
>>> The goal, I think, is to have some options allowing control over how a
>>> query coming from a MOL block/CTAB actually matches target molecules. One
>>> possible model for this would be to look at the options that were available
>>> for searching in systems like ISIS/Host and ISIS/Base (and whatever it is
>>> that they are now called). I no longer have access to those, but I would
>>> guess that someone in the community may or that some googling will turn up
>>> documentation describing/showing the options. I remember there being
>>> options like: "search as drawn", "allow/disallow substitution at
>>> heteroatoms", "allow substitution everywhere", etc. This may be a good
>>> starting point, then we can think about what kind of options we want to add
>>> for interpreting "R" groups or Hs that have been explicitly added to the
>>> drawing.
>>>
>>> Does the thought make sense to you guys? Does anyone have access
>>> to/remember better what those search options are?
>>>
>>> -greg
>>> [1] all the while keeping somewhere in mind that the core of the RDKit
>>> is really using a more "Daylight-like" model and that there is almost
>>> certainly going to be some mismatch with the MDL model... but we'll worry
>>> about that when we get there.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 7:04 PM, Brian Kelley <fustiga...@gmail.com>
>>> wrote:
>>>
>>>> An interesting conversation came up at work a few days ago regarding
>>>> MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
>>>> tackling some of these issues when it comes to reaction processing .rxn
>>>> based files and plan on contributing it relatively soon.  However, I hadn't
>>>> considered making it a generic Query based sanitization/processing.
>>>>
>>>>
>>>> The basic question was "How do I get a MolBlock to only match the "R"'s
>>>> and not allow substitutions anywhere else? like ChemAxon..."
>>>>
>>>>
>>>> As it turns out, RDKit is very strict when it looks at RGroups.  This
>>>> was the initial issue with when i started Sanitizing RGroups.  Basically
>>>> there are several variants in the wild (ChemDraw/ICM) that make reactions
>>>> that don't quite follow the CTAB spec.  RDKit likes the atom labled R to
>>>> (1) actually be in an "M  RGP" tag and (2) have an atom mapping.  If an
>>>> atom is labeled "R" and not in a R_GRP it isn't considered a wild card for
>>>> instance.
>>>>
>>>> Now queries don't really care about "M  RGP", but they do care that it
>>>> isn't a dummy atom.  I'm listing below our current technique to fix these
>>>> issues for CTAB queries and would like some feedback.
>>>>
>>>> Here is the workflow that we have been telling chemists during
>>>> sketching:
>>>>
>>>> 1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough,
>>>> you can replace it with "A", but R has special semantics and needs an
>>>> RGroup label defined.
>>>> 2. aromatize where appropriate
>>>> 3. (optionally) protonate so only RGroups can match
>>>>
>>>> These line up with the following RDKit code snippets:
>>>>
>>>> 1. Fix the "R"s (note we probably should make proper RGroups, but this
>>>> just add dummy matches)
>>>>
>>>> qmol = rdkit.Chem.MolFromMolblock(molblock)
>>>> # first, change the "R"'s into matching any atoms
>>>> from rdkit.Chem import rdqueries
>>>> qmol = Chem.RWMol(qmol)
>>>> for atom in newpat.GetAtoms():
>>>>     if atom.GetAtomicNum() == 0:
>>>>        qmol.ReplaceAtom(atom.GetIdx(),
>>>> rdqueries.AtomNumGreaterQueryAtom(0))
>>>>
>>>>
>>>> 2. aromatize - this might be good or might break things.  It seems to
>>>> work great, even with conditional logic i.e. [C,O] but I'm unsure which
>>>> atom is actually being used to form the Pi electrons for aromaticity
>>>> checking.  I expect the First actually.  In anycase, something needs to
>>>> happen in general for random inputs, otherwise the matching doesn't really
>>>> do what is expected.
>>>>
>>>> # We want to see if we can find aromaticity, this may be complicated
>>>> with
>>>> #  query features [C,O] but it works ok.
>>>> Chem.SanitizeMol(qmol, Chem.SANITIZE_SETAROMATICITY)
>>>>
>>>> 3. protonate if the desire is to only match RGroups
>>>>
>>>> # second, add explicit Hs so we only match the Rs
>>>> # I'm unclear if this can fail in general, I would probably wrap this in
>>>> #  a try...except block
>>>> Chem.SanitizeMol(qmol, Chem.SANITIZE_ADJUSTHS)
>>>> qmol = Chem.MergeQueryHs(Chem.AddHs(qmol))
>>>>
>>>> This could be enabled with flags into a SanitizeQuery function, or
>>>> perhaps a PrepareQuery function.
>>>>
>>>> Thoughts?
>>>>
>>>> Cheers,
>>>>  Brian
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>>>> traffic
>>>> patterns at an interface-level. Reveals which users, apps, and
>>>> protocols are
>>>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>>>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>>>> planning reports.
>>>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning reports.
>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to