Hello, based on this article: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0195-1
I have been trying to make what they call a 'database fingerprint'. The first step seems to require obtaining the frequencies of each fingerprint bit in a database of molecules. To do that, I calculated the fingerprints of a list of molecules (much larger than the one below; this is just an example): ms = [Chem.MolFromSmiles(s) for s in ['c1ccccc1','CCC','CCCO']] fps = [rdMolDescriptors.GetMorganFingerprint(m, 3, useCounts = False) for m in ms] My first attempt to obtain the database fingerprint was by looping trough the fps and summing (+=), as that is reported to be an allowed operation for these fingerprints. This worked, but was very slow. My next attempt was to convert each fingerprint to a dictionary, and build the dictionary corresponding to the database fingerprint: database_fp_new = dict() for i,fp in enumerate(fps): for fpbit in fp.GetNonzeroElements(): if fpbit in database_fp_new: database_fp_new[fpbit] += 1 else: database_fp_new[fpbit] = 1 This worked, too, gave the same result as the '#=' approach, and was much faster. {98513984: 1, 2763854213: 1, 3218693969: 1, 3741631696: 1, 2068133184: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2, 864662311: 1, 1173125914: 1, 1365892349: 1, 1535166686: 1, 4023654873: 1} However, then I have a dictionary. But I need a fingerprint, because I want to do operations like similarity calculations (e.g. https://www.rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html?highlight=bulktanimoto#rdkit.DataStructs.cDataStructs.BulkTanimotoSimilarity ). Would anyone be able suggest if and how the dictionary can be turned back into a fingerprint, or perhaps advise how to make the database fingerprint in a different way, if the one I figured out is not optimal? Thank you -- This e-mail and its attachment(s) (if any) may contain confidential and/or proprietary information and is intended for its addressee(s) only. Any unauthorized use of the information contained herein (including, but not limited to, alteration, reproduction, communication, distribution or any other form of dissemination) is strictly prohibited. If you are not the intended addressee, please notify the originator promptly and delete this e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor any of its affiliates shall be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message (by a third party) or as a result of a virus being passed on.
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss