>
> >  which could of course also be changed to something expensive to
> calculate.
> Yes, that could be possible. Abstractly, let the first 20 bytes of each
> fingerprint be a salt, and use something like bcrypt so each fingerprint
> test requires that the query structure be re-fingerprinted for the
> per-fingerprint hash function.

I think salting is a must. If any mony is at stake, I'd suspect equally
computing power used to crack it. The closes analogy and walk-around for
the slow computing hashing are "rainbow tables" for strings. So instead of
computing the hash, you just need to look it up. Without salting such
lookup tables would not be that big i suppose. If you had such lookup
table, then you'd only need an algorithm (or GA) that builds a molecule
from a set of environments not randomly build it.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
[email protected]

2018-04-22 22:25 GMT+02:00 Andrew Dalke <[email protected]>:

> On Apr 22, 2018, at 20:22, Nils Weskamp <[email protected]> wrote:
> > Actually, I *was* also thinking about your use cases 2 and 3 since you
> > also need some form of hash function to map substructures to bit
> > numbers. This is normally a rather simple function / pseudo random
> > generator,
>
> Strictly speaking, this is not a requirement.
>
> The term "fingerprint" has taken on quite an encompassing meaning since
> 1990.
>
> The molecular formula is a count fingerprint with 118 keys, based on the
> atomic number. There's no need for hash function there. "CCO" might be:
>   [0, 0, 0, 0, 0, 2, 0, 1, ...]
>
> Or it can be written in more compact form like {"C": 2, "O": 1}.
>
> As an alternative, I could use a mapping from canonical substructures to
> counts, so "CCO" becomes:
>
>   {"C": 2, "O": 1, "CC": 1, "CO": 1, "CCO": 1}
>
> This doesn't require a hash. (While I represent that as a Python
> dictionary, which uses a hash table underneath, it could be implemented
> using a red-black tree or B-tree, or with a simple linear search.)
>
> It's only if I want to convert this into fixed length representation where
> I have to figure out some sort of encoding scheme.
>
> Even then, I don't need a PRNG or hash seed. Suppose I use a bit vector. I
> could have a table which maps all canonical substructures to its bit
> pattern. If I have an unknown fragment, I could use RANDOM.ORG to get the
> bits.
>
> Downsides include potentially unbounded table growth and the need for a
> centralized table.
>
> This is the approach that Zatocoding used, and I see Chemical Zatocoding
> as the only precursor to Daylight hash fingerprints.
>
> >  which could of course also be changed to something expensive to
> calculate.
>
>
> Yes, that could be possible. Abstractly, let the first 20 bytes of each
> fingerprint be a salt, and use something like bcrypt so each fingerprint
> test requires that the query structure be re-fingerprinted for the
> per-fingerprint hash function.
>
> It would, however, take an absurdly long time to do a similarity search.
>
> And in any case, before going further along that path, we would need to
> figure out the risk model. Brian started by saying that he wanted to
> obfuscate molecules for security, but didn't say what he want to use them
> for, and if he want to secure them against nation-states, or simply against
> me. ;)
>
>
>
>                                 Andrew
>                                 [email protected]
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to