Hello,
      Firstly I'm a statistician with next to no knowledge of chemistry.
But I want to test a new approach for generating Tanimoto similarities
based on an alternative type of fingerprint. So I want to use rdkit to
generate feature sets from molecules, and use those to construct
fingerprints. I have tried to identify how to do this by reading the
docs and looking at the code, but haven't been able to find the relevant
code (I'm guessing it's in C++ somewhere, and I'm only really fluent in
Python). The nearest I've been able to get is,


>>> import rdkit
>>> from rdkit import Chem
>>> m = Chem.MolFromSmiles('Cc1ccccc1')
>>> len(Chem.RDKFingerprint(m, nBitsPerHash=1, fpSize=1024).GetOnBits())
18
>>> len(Chem.RDKFingerprint(m, nBitsPerHash=1, fpSize=2048).GetOnBits())
18
>>> len(Chem.RDKFingerprint(m, nBitsPerHash=1, fpSize=4096).GetOnBits())
19
>>> len(Chem.RDKFingerprint(m, nBitsPerHash=1, fpSize=16384).GetOnBits())
19
>>>


So it appears that there is probably 19 features, and I could take the
set bit positions and use them to construct fingerprints. But I'd rather
cut out the uncertainty over hash collisions. I want to compare my
approach with that of Kristensen et al. (2010) who used 2 million
commercially available molecules from the ZINC database (version 8).
They used the CDK fingerprint generator, but don't provide further details.

So, is there some way I can generate features directly? (This will also
allow me to calculate the true Tanimoto scores to compare with the
estimates generated by fingerprints.) Any help regarding suitable test
data and feature sets would also be appreciated. My download attempts
for ZINC data keep failing after a few hundred KB and I don't want to
use CDK if not needed. Thanks (in advance).

Duncan Smith


Kristensen et al. (2010) A tree-based method for the rapid screening of
chemical fingerprints. Algorithms for Molecular Biology 5:9


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to