Re: [Rdkit-discuss] how to use structure as substructure query

Greg Landrum Tue, 04 Dec 2012 02:57:27 -0800

On Tue, Dec 4, 2012 at 11:38 AM, Andrew Dalke <[email protected]>wrote:


> I am beginning to realize the error of my ways.
>
> This is the same issue which occurred in fmcs. Suppose
> you have c1ccccc1C and CC. The MCS between those two is
> [#6]-[#6]. Atom aromaticity is not useful when doing
> a comparison.
>
>
> On Dec 4, 2012, at 5:32 AM, Greg Landrum wrote:
> > Aromaticity is ignored, so this is correct.
>
> Yes. Atom aromaticity is ignored. Bond aromaticity is not.
>

correct, that's how I started that message off:
Aromaticity is not used in the matching criteria for atoms.
Bonds are matched purely using bond type, with the one exception that a
bond of unspecified type matches anything and is matched by anything.


> >> Based on this, it seems that I can't use SMARTS patterns to define
> >> a screen which is easily compatible with the molecule-based substructure
> >> matcher.
> >>
> > Sure you can, but you'll need to use atomic numbers in the SMARTS
> instead of letters in order to avoid the aromatic/alphatic queries.
>
> Very good point. I'll have to go back and redo how I build my SMARTS
> fragments
> in the first place. But then again, if I use aromaticity-free SMARTS then
> I'll
> end up with a toolkit independent set of patterns, which is what you have:
>
> >                       "[R]~1[R]~[R]~[R]~[R]~[R]~1",
> >
> >      These don't suffer from the aromatic/aliphatic problem.
>
>
> However, this then places a larger burden on the substructure matcher,
> since it will say that cyclohexane doesn't match benzene and vice versa,
> even though the above screen does not distinguish between the two.
>

hmm, not sure what you mean there. Given that no fingerprint screen is
going to be perfect, the substructure matcher is going to have to do this
anyway.


It seems that having some aromatic bond-based screens would help.
>
>
> BTW, there's a typo in the above. It should be
> "[R]~1~[R]~[R]~[R]~[R]~[R]~1".
> The pattern is missing a '~' between the first and second [R]. But 6
> element
> rings with only double and triple bonds are .. explosive? At the very
> least, quite unlikely.
>

Right you are. Thanks!

 >>
> >> What I think I can do is:
> >>   1) parse the SMILES for the query
> >>   2) remove any explicit hydrogens
> >>   3) use Chem.MolFragmentToSmiles to turn the de-hydrogenated molecule
> >>       into a SMARTS string
> >>   4) convert the SMARTS into the actual query
> >>
> > I think you can just use Chem.MolToSmiles(dhmol,canonical=False), but
> otherwise this flow looks ok. If you could "trust" your users to always
> provide aromatic SMILES, you could just skip this whole mess and use
> MolFromSmarts at the beginning. I guess you're trying to avoid that though.
>
> Ahh, yes, that would also produce a viable SMARTS.
>
> I can't use MolFromSmarts from the beginning because "n1nc[nH]c1" has the
> explicit hydrogen
> which was not a user-specified constraint but only added by the sketcher.


ah, ok. Of course, that H could theoretically be there because the user
wanted an H in that position... it's always tough to say.

-greg

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] how to use structure as substructure query

Reply via email to