Hi Andrew,

I'm going to divide this into pieces in order to be able to answer in
a reasonable amount of time.
I'll do clarifying questions and quick answers in this one.

On Thu, May 26, 2011 at 4:02 PM, Andrew Dalke <da...@dalkescientific.com> wrote:
> RDKit implements the MACCS keys as a set of SMARTS patterns,
> plus a few bits coded by hand.
>
> I don't know how much people know the impact of this on the other
> free software projects. OpenBabel and CDK both use copies of the
> RDKit definitions for their own MACCS keys. While I've seen
> earlier internal definitions, they were held rather closely, so
> it's very nice to have a public definition.
>
> I'm reviewing the definitions as part of my chemfp project,
> which is one of the advantages to having an open definition.
>
> I've got some question or suggestions about them. For reference,
> see http://rdkit.org/Python_Docs/rdkit.Chem.MACCSkeys-pysrc.html
>
>
> * Bit 1 is
>
>   1:('?',0), # ISOTOPE
>
> This explicitly isn't defined, but shouldn't it be [!*0] ?
>
> I tried out that SMARTS and I see that a SMILES of "C[14CH3]"
> has two matches in RDKit to [!0*] but in OEChem there's only
> one. I think the OEChem version is correct. I verified it at
>
> http://www.daylight.com/daycgi_tutorials/depictmatch.cgi
> with the SMILES of C[14CH2][13CH3] and SMARTS of [!0*] .
> Daylight matches 2 of the 3 atoms.
>
> I think this is a bug in RDKit, and once fixed it would
> mean this bit could be supported.
>

My reading of the SMARTS theory manual
(http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html) says
that [0*] means "any atom with a mass of 0", so [!0*] would be "any
atom that doesn't have a mass of 0". What am I missing?

>
> * Bit 2 is
>
>  #2:('[#103,#104,#105,#106,#107,#106,#109,#110,#111,#112]',0),  # ISOTOPE Not 
> complete
>   2:('[#103,#104]',0),  # ISOTOPE Not complete
>
> I assume the comment is wrong, since this has nothing to do with isotopes.
>
> What's not complete about this definition, and/or why is the first one 
> commented out?

I've got to see if I can find a description of the bits and I'll come
back to these definition questions.

>
>   18:('[B,Al,Ga,In,Tl]',0), # Group IIIA (B...) *NOTE* spec wrong
>
> Boron may be aromatic according to the SMILES spec, so this
> should be [B,b, ...] or [#5, ... ].
>
> Also, here's the aromatic elements in OpenBabel:
>
> [se]
> [as]
> [si]
> [ge]
> [sb]
> [bi]
> [te]
> [sn]
>
> Not all of these are valid SMARTS according to Daylight, and
> RDKit doesn't support the same set of aromatics, so for a
> portable version (which I'm working on) they can be written as
>
> [#34], [#33], [#14], ...
>
> Oh, and aromatic lead has been synthesized
>
>  http://www.rsc.org/chemistryworld/News/2010/April/15041002.asp
>

Agreed that using the generic atomic-number form makes a lot more sense.

> * Bit 101 says:
>
>  8M Ring or larger. This only handles up to ring sizes of 14
>
> Is it worthwhile to support larger rings? I don't think so.
> If yes, then it could be dealt with outside of the SMARTS,
> just like 125 and 166.

Agreed that it's not really necessary to support larger rings. Systems
with rings larger than 14 would end up missing a single bit.

> BTW, I also verified that all of the CH2 atoms were written
> as either [CH2] (if there are two bonds other atoms) or
> [C;H2,H3] if there is only one bond (and similar with [NH2]).
> While strange chemistries can cause this to fail as a
> substructure filter, I recognize that that is outside
> the scope of those definitions.

it certainly is for me. :-)

-greg

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to