Hello all,

I am curious on how to fold a count vector fingerprint. I understand when
folding bit vectors the most common way is to split the vector in half, and
apply a bitwise OR operation. I think this is how the function
rdkit.DataStructs.FoldFingerprint works in RDKit, correct me if I am wrong.

How does RDKit and or what is the appropriate way to fold count vectors
such as AtomPair, Morgan, and Topological torsion?

I thought about turning the fingerprint into a bit vector using their
respected "AsBitVect" method then folding using
rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't have a "
AsBitVect" method [https://www.rdkit.org/docs/GettingStartedInPython.html].

For an explicit example using AtomPair fingerprint we can see the
fingerprint is extremely sparse. Could this AtomPair fingerprint be folded
to increase the density?

>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem

>>> mol = Chem.MolFromSmiles('CC1CCCCC1')
>>> ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1, maxLength=3)

>>> number_of_nonzero_elements = len(ap_fp.GetNonzeroElements().values())

>>> print((ap_fp.GetLength(),number_of_nonzero_elements))
(8388608,9)

Very Respectfully,

Ben
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to