(getting dangerously old fart chatty here but) we crafted an in-house
molecular fingerprint once which was designed to hash out whether a
compound would've pissed off the high-throughput/organic-chemists or not.
(essentially anything with "exotic atoms" (like Boron?) or strained bonds
(like less than 120 degrees)).  so that fingerprint collided all of
chemical space into two bins.  "That's not a fingerprint!"...yeah, but it
fell right into the code alongside the fingerprints and was used as such.

Now, any bets are off on whether we used the HTS-fingerprint to _find_ or
exclude molecules [wink]

(ok returning to lurker-mode now)
-- 
j

On Fri, Apr 20, 2018 at 10:49 AM, Peter S. Shenkin <shen...@gmail.com>
wrote:

> Well, @jeff, there's no law saying that hashes must collide, and in fact
> some are designed to make collision extremely unlikely (can you say
> "SHA-2"?). But the ones in question here do collide relatively frequently,
> for at least some molecular fingerprint types.
>
> An interesting question (maybe only to me :-) ) would be how similar, in
> general, the structures are that exhibit identical fingerprints, for the
> well-known fingerprint types, for various fingerprint lengths. A
> sufficiently complicated molecule will give lots of on bits, and for (say)
> a 64-fit fingerprint, there can only be 64 possible fingerprints with all
> but one bit turned on.
>
> I realize that most fingerprints in common use today are longer than this,
> but still, looking back at 64- and 32-bit fingerprints with all but one
> bits on might give some insight. How short does a fingerprint of some
> particular type have to be for, say, 10% of CHEMBL molecules to exhibit an
> all-on pattern? How short does it have to be for, say, 10% of CHEMBL
> molecules to have an exact fingerprint match with some other molecule?
>
> -P
>
> On Fri, Apr 20, 2018 at 1:03 PM, jeff godden <jgod...@gmail.com> wrote:
>
>> Long ago molecular fingerprints were referred to in the literature as
>> molecular hash functions. (y'know, those crazy mathematical algorithms
>> which permitted rapid lookup of some string in a lookup table)  As such, we
>> expected for their to be the associated hash collisions  (
>> https://en.wikipedia.org/wiki/Hash_table#Collision_resolution ).  All
>> this by way of saying that to go from fingerprint to the molecular
>> structure which produced it is traditionally impossible unless the
>> fingerprint no longer amounts to a hash(ing) function.
>> --
>> j
>>
>>
>> On Fri, Apr 20, 2018 at 9:56 AM, Peter S. Shenkin <shen...@gmail.com>
>> wrote:
>>
>>> Isn't it the case that more than one molecule can share an identical
>>> fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl,
>>> extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could
>>> keep going and come up with multiple matches, plus multiple near-misses.
>>>
>>> -P.
>>>
>>> On Fri, Apr 20, 2018 at 10:58 AM, David Cosgrove <
>>> davidacosgrov...@gmail.com> wrote:
>>>
>>>> Hi Brian,
>>>> Dave Weininger once showed a fairly simple GA that could generally
>>>> deduce a structure from a daylight fingerprint by using SMILES strings as
>>>> the chromosomes and tanimoto distance to the target fingerprint as the
>>>> fitness function.  He may have done a talk about it for MUG or conceivably
>>>> written it up. It’d be in JCICS if so, I expect.
>>>>
>>>> You could probably knock up a script to do that in a couple of hours I
>>>> would think using a GA library to do the mechanics. If you’re not worried
>>>> about high efficiency, you don’t need to do anything fancy with mutation
>>>> and crossover of the SMILES strings to ensure you always get a valid
>>>> molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
>>>> like what you give it.
>>>> HTH,
>>>> Dave
>>>>
>>>>
>>>> On Fri, 20 Apr 2018 at 14:45, Nils Weskamp <nils.wesk...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Brian,
>>>>>
>>>>> in general, it might be difficult to come up with a deterministic
>>>>> algorithm that generates exactly one structure for a given fingerprint due
>>>>> to many ambiguities in the process. If you are happy with a more "fuzzy"
>>>>> (approximate / probabilistic) approach, you might want to take a look at
>>>>>
>>>>> https://pubs.acs.org/doi/abs/10.1021/ci600383v
>>>>> https://link.springer.com/article/10.1007/s10822-005-9020-4
>>>>>
>>>>> Given this task, I would probably start with a large database of known
>>>>> compounds (PubChem, UniChem, GDB17), calculate fingerprints and then do a
>>>>> similarity search with my query fingerprint.
>>>>>
>>>>> Hope this helps,
>>>>> Nils
>>>>>
>>>>>
>>>>> On Fri, Apr 20, 2018 at 3:13 PM, Brian Cole <col...@gmail.com> wrote:
>>>>>
>>>>>> Hi Chem-informaticians:
>>>>>>
>>>>>> I know it has been talked about in the community that fingerprints
>>>>>> are not a way to obfuscate molecules for security, but I don't recall a
>>>>>> paper actually demonstrating actual reverse engineering a fingerprint 
>>>>>> into
>>>>>> a chemical structure. Does anyone know if such a paper exists?
>>>>>>
>>>>>> Code using RDKit to demonstrate the functionality would be an obvious
>>>>>> bonus as well. :-)
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> ------------------
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> Rdkit-discuss mailing list
>>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>>
>>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>>>>> _________________________________________
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>> --
>>>> David Cosgrove
>>>> Freelance computational chemistry and chemoinformatics developer
>>>> http://cozchemix.co.uk
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to