Well, @jeff, there's no law saying that hashes must collide, and in fact
some are designed to make collision extremely unlikely (can you say
"SHA-2"?). But the ones in question here do collide relatively frequently,
for at least some molecular fingerprint types.
An interesting question (maybe only to me :-) ) would be how similar, in
general, the structures are that exhibit identical fingerprints, for the
well-known fingerprint types, for various fingerprint lengths. A
sufficiently complicated molecule will give lots of on bits, and for (say)
a 64-fit fingerprint, there can only be 64 possible fingerprints with all
but one bit turned on.
I realize that most fingerprints in common use today are longer than this,
but still, looking back at 64- and 32-bit fingerprints with all but one
bits on might give some insight. How short does a fingerprint of some
particular type have to be for, say, 10% of CHEMBL molecules to exhibit an
all-on pattern? How short does it have to be for, say, 10% of CHEMBL
molecules to have an exact fingerprint match with some other molecule?
-P
On Fri, Apr 20, 2018 at 1:03 PM, jeff godden <jgod...@gmail.com> wrote:
> Long ago molecular fingerprints were referred to in the literature as
> molecular hash functions. (y'know, those crazy mathematical algorithms
> which permitted rapid lookup of some string in a lookup table) As such, we
> expected for their to be the associated hash collisions (
> https://en.wikipedia.org/wiki/Hash_table#Collision_resolution ). All
> this by way of saying that to go from fingerprint to the molecular
> structure which produced it is traditionally impossible unless the
> fingerprint no longer amounts to a hash(ing) function.
> --
> j
>
>
> On Fri, Apr 20, 2018 at 9:56 AM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Isn't it the case that more than one molecule can share an identical
>> fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl,
>> extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could
>> keep going and come up with multiple matches, plus multiple near-misses.
>>
>> -P.
>>
>> On Fri, Apr 20, 2018 at 10:58 AM, David Cosgrove <
>> davidacosgrov...@gmail.com> wrote:
>>
>>> Hi Brian,
>>> Dave Weininger once showed a fairly simple GA that could generally
>>> deduce a structure from a daylight fingerprint by using SMILES strings as
>>> the chromosomes and tanimoto distance to the target fingerprint as the
>>> fitness function. He may have done a talk about it for MUG or conceivably
>>> written it up. It’d be in JCICS if so, I expect.
>>>
>>> You could probably knock up a script to do that in a couple of hours I
>>> would think using a GA library to do the mechanics. If you’re not worried
>>> about high efficiency, you don’t need to do anything fancy with mutation
>>> and crossover of the SMILES strings to ensure you always get a valid
>>> molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
>>> like what you give it.
>>> HTH,
>>> Dave
>>>
>>>
>>> On Fri, 20 Apr 2018 at 14:45, Nils Weskamp <nils.wesk...@gmail.com>
>>> wrote:
>>>
>>>> Hi Brian,
>>>>
>>>> in general, it might be difficult to come up with a deterministic
>>>> algorithm that generates exactly one structure for a given fingerprint due
>>>> to many ambiguities in the process. If you are happy with a more "fuzzy"
>>>> (approximate / probabilistic) approach, you might want to take a look at
>>>>
>>>> https://pubs.acs.org/doi/abs/10.1021/ci600383v
>>>> https://link.springer.com/article/10.1007/s10822-005-9020-4
>>>>
>>>> Given this task, I would probably start with a large database of known
>>>> compounds (PubChem, UniChem, GDB17), calculate fingerprints and then do a
>>>> similarity search with my query fingerprint.
>>>>
>>>> Hope this helps,
>>>> Nils
>>>>
>>>>
>>>> On Fri, Apr 20, 2018 at 3:13 PM, Brian Cole <col...@gmail.com> wrote:
>>>>
>>>>> Hi Chem-informaticians:
>>>>>
>>>>> I know it has been talked about in the community that fingerprints are
>>>>> not a way to obfuscate molecules for security, but I don't recall a paper
>>>>> actually demonstrating actual reverse engineering a fingerprint into a
>>>>> chemical structure. Does anyone know if such a paper exists?
>>>>>
>>>>> Code using RDKit to demonstrate the functionality would be an obvious
>>>>> bonus as well. :-)
>>>>>
>>>>> Thanks,
>>>>> Brian
>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>>>> _________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>> --
>>> David Cosgrove
>>> Freelance computational chemistry and chemoinformatics developer
>>> http://cozchemix.co.uk
>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss