Re: [Open Babel] pybel calcfp() mismatch?

Andrew Dalke Fri, 15 May 2015 02:31:50 -0700

On May 14, 2015, at 6:10 PM, R. K. Belew wrote:
> i'm trying to make sure i understand just what `calcfp()` is
> doing, by trying to reconcile its two `fp` and `bits` return values.
> below is a short demo, showing i cannot.  i'm sure i'm missing
> something simple, but don't know what!


In your pybelBits2binary you treat everything as little-endian, so bit 0 is the 
furthest left, bit is second from the left, and so on.

>>> pybelBits2binary([1])
'1000000000000 ....
>>> pybelBits2binary([2])
'0100000000000 ....
>>> pybelBits2binary([3])
'0010000000000 ....

The format(num, '032b') formats a 32-bit integer value as big-endian, so given 
a 32 bit word the value of 1 will set the rightmost bit, 2 will set the next 
rightmost, etc.

>>> format(1, "032b")
'00000000000000000000000000000001'
>>> format(2, "032b")
'00000000000000000000000000000010'
>>> format(3, "032b")
'00000000000000000000000000000011'

You can see that both of the bits you generate have the same number of bits:

>>> from collections import Counter
>>> Counter(bits1)
Counter({'0': 979, '1': 45})
>>> Counter(bits2)
Counter({'0': 979, '1': 45})

The only difference is the arrangement. For each group of 32 bits, the one 
fingerprint is in reverse order of the other:

>>> bits1[:32]
'00000000000000000000000000010000'
>>> bits2[:32][::-1]
'00000000000000000000000000010000'

>>> bits1[32:64]
'00010000000000000000000000000011'
>>> bits2[63:31:-1]
'00010000000000000000000000000011'

The simplest way to get the two fingerprints to match is to change how 'bits2' 
is created, so that the bits for each integer are in little-endien order. The 
current code is:

>>> bits2 = ''.join([format(num,'032b') for num in fp.fp])
>>> bits1 == bits2
False

and the change to make bits2 match bits1 is:

>>> bits2 = ''.join([format(num,'032b')[::-1] for num in fp.fp])
>>> bits1 == bits2
True

This will end up with a pure little-endian fingerprint.

By the way, if you want the byte-oriented version of the little-endian 
fingerprint, where the first byte contains the first 8 bits (in big-endian 
order), the second byte contains the next 8 bits, etc. then you can use the 
struct module.

>>> byte_fp = struct.pack("<" + "I"*32, *fp.fp)
>>> byte_fp
'\x00\x00\x00\x08\x08\x00\x00\xc0\x00 
\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"\x00\x00\x00\x00\x80\x00\x00\x00\x00\x04\x00@\x00\x00\x00\x00
  
\n\x02\x08\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x08\x00\x08\x00@\x00\x02\x00\x00\x00\x00\x00\x10\x80\x00\x00\x00\x90\x00\xc0\x04\x00\x00\x00@\x08\x00\x00\x00\x01\x00@\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x01\x00\x00\x00\x00\x00\x00\x02\x01\x00\x00\x01\x00\x00\x00\x88\x00@\x02@\x00\x00'

I'll construct a simple translation table to convert a byte into its 
little-endian representation

>>> T = {chr(i): format(i, "08b")[::-1] for i in range(256)}

and use that table to show that the new byte_fp is the same as bits1.

>>> bits1 == "".join(T[byte] for byte in byte_fp)
True


Best regards,

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] pybel calcfp() mismatch?

Reply via email to