Just a note regarding the use of fingerprints for repeating structures such
as nanotubes. The bits in the fingerprint quickly become saturated as they
are based on local structural information which is the same again and again
within the repeating structure. For this reason, all peptides, for example,
appear equally similar to all other peptides (to a first approximation).
And all carbon nanotubes may appear equally similar also. Something to bear
in mind. This is one of the arguments for measuring similarity in a
different way, for example graph edit distance, or a descriptor that
captures some measurable global structural property related to a physical
property of interest.

- Noel

On Fri, 7 Dec 2018 at 10:22, Noel O'Boyle <baoille...@gmail.com> wrote:

> An ECFP4 implementation could use a single bit or a million bits. The
> actual information that is being encoded is an element of a set of size of
> more than billions (I forget the details). So it's hashed to something
> manageable. The shorter the length, the more bit collisions (everything
> will collide with a single bit, for example). Open Babel uses 4096. I would
> regard this as the minimum.
>
> When converting from hex, you could concatenate the binaries. Or you could
> use pybel which doesn't the conversion for you:
> >>> pybel.readstring("smi", "c1ccccc1C(=O)Cl").calcfp("ecfp4").bits
> [556, 1348, 1509, 1547, 1993, 2078, 2089, 2378, 2487, 2531, 2700, 3017,
> 3023, 3117, 3324, 3395, 3599, 4036]
>
> These are the bits that are set. If you use "len", you can get the number
> of them.
>
> Regards,
> - Noel
>
>
> On Fri, 7 Dec 2018 at 09:49, I. Camps <ica...@gmail.com> wrote:
>
>> @Geoff
>> I use Python.
>> I already made an script to convert hex to binary, but as I wrote
>> previously, the fingerprint (fp) from OpenBabel is in the form of a set of
>> hex numbers. I converted each one to binary and then concatenate all the
>> binaries. Is it that okay?
>> If it is okay, the second problem is that the fp is much longer (6040)
>> than the RDKit (1024). I really do not understand the "folded" issue
>> because any read about ECFP4 talk about a 1024 bit string and not higher.
>>
>> @Francois
>> I certainly will take a look!
>>
>> thank you both.
>>
>> Camps
>>
>>
>> On Fri, Dec 7, 2018 at 1:59 AM Geoffrey Hutchison <
>> geoff.hutchi...@gmail.com> wrote:
>>
>>> Using OpenBabel, I got a file with the information that the fingerprint
>>> is a 6040 bits set and got hexadecimal numbers.
>>> Using PyBioMed, which is based in RDKIT, I got a binary string of 1024
>>> bits, very different from that obtained with OpenBabel.
>>>
>>>
>>> The RDKit binary string will be "folded" down to 1024 bits, so of course
>>> they will be very different bit strings.
>>>
>>> 2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal
>>> form to a bit string with only ones and zeros?
>>>
>>>
>>> What programming language are you using? For example in Python, a quick
>>> search on StackExchange:
>>>  https://stackoverflow.com/questions/1425493/convert-hex-to-binary
>>>
>>> Hope that helps,
>>> -Geoff
>>>
>> _______________________________________________
>> OpenBabel-discuss mailing list
>> OpenBabel-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>>
>
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to