I see. Are these what you call "layered" fingerprints? How do they differ from
the Daylight-like fingerprints?
Looking forward for the C++ sample code.
Thank you very much for all your help.
Gonzalo
From: Greg Landrum [mailto:[email protected]]
Sent: 09 June 2013 17:29
To: Gonzalo Colmenarejo-Sanchez
Cc: [email protected]
Subject: Re: [Rdkit-discuss] substructure search with fingerprints in C++
On Sun, Jun 9, 2013 at 12:29 PM, Gonzalo Colmenarejo-Sanchez
<[email protected]<mailto:[email protected]>> wrote:
Yes, C++ code examples for preprocessed molecules and fingerprints would be
extremely helpful too.
I'll put one together and send it along. I don't normally do file i/o from C++,
so it's taking me longer than I expected to get it working.
By the way, if the query is a SMARTS like e.g. "c1aaccc1" (representing several
substructures), what fingerprint is exposed to AllProbeBitsMatch, the union of
all the possible fingerprints, all the possible fingerprints sequentially, etc?
It's a single fingerprint. The code essentially doesn't include substructures
in the fingerprint that include query features. This means that the FPs are not
incredibly efficient if you have query molecules that include a high density of
query features.
Here's an example showing what happens with an extremely simple case.
Start with a simple molecule:
In [21]: list(Chem.PatternFingerprint(Chem.MolFromSmiles('CC')).GetOnBits())
Out[21]: [429, 778, 1022]
This matches one substructure query pattern "[*]~[*]" twice, so it sets three
bits: one bit for each match and one for the fact that the match is "CC".
Constructing the same molecule from SMARTS gives the same result, the
fingerprinter knows how to deal sensibly with the implicit queries in SMARTS:
In [22]: list(Chem.PatternFingerprint(Chem.MolFromSmarts('CC')).GetOnBits())
Out[22]: [429, 778, 1022]
But as soon as I add a query feature, I lose a bit:
In [23]: list(Chem.PatternFingerprint(Chem.MolFromSmarts('C[A]')).GetOnBits())
Out[23]: [429, 1022]
This still matches "[*]~[*]" twice, but since the match involves a query
feature, there's no bit set for the match itself.
If I make the match asymmetric, I get four bits:
In [24]: list(Chem.PatternFingerprint(Chem.MolFromSmarts('CO')).GetOnBits())
Out[24]: [54, 429, 759, 1022]
This matches "[*]~[*]" twice, but "OC" and "CO" generate different bits.
Make sense?
-greg
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss