Hi Matthew,

On Fri, Jul 19, 2013 at 1:55 PM, Maciej Szymkiewicz
<[email protected]>wrote:

> Hello everyone,
>
> I new here so at the beginning I'd like to introduce myself. My name is
> Matthew and I'm student of Bioinformatics at the University of Warsaw.
> I am also RDKit newbie so please be patient.
>

Welcome.


>
> Currently I’m working on couple of small services using PostgreSQL
> cartridge and RDKit Python wrapper.
> Debian 3.9.6-1 x86_64 GNU/Linux
> PostgreSQL 9.1.9
> RDKit 2013_03_2
>
> I obtain quite different similarity values using Postgres and Python.
> For example for simple script available here:
> http://pastebin.com/M8j3dMCj (empty db named foo, cartridge installed
> with schema rdkit)
> i get output like below.
>
> Morgan: python = 0.145833333333, postgres = 0.179775280899
> RDKit : python = 0.427549194991, postgres = 0.485889570552
> MACCS : python = 0.597402597403, postgres = 0.597402597403
> Atompair : python = 0.21935483871, postgres = 0.322335025381
> Torsion (dice) : python = 0.102941176471, postgres = 0.246153846154
> Layered: python = 0.555211558308, postgres = 0.654569892473
>
> I assume it's mainly because of difference in fingerprint size
> and I tried changing parameters on Python side but no luck so far.
> I would be grateful for any help.
>

It is indeed the fingerprint size.
Here are the size parameters used by the cartridge (
https://github.com/rdkit/rdkit/blob/master/Code/PgSQL/rdkit/adapter.cpp):

const unsigned int SSS_FP_SIZE=2048;
const unsigned int LAYERED_FP_SIZE=1024;
const unsigned int MORGAN_FP_SIZE=512;
const unsigned int HASHED_TORSION_FP_SIZE=1024;
const unsigned int HASHED_PAIR_FP_SIZE=2048;

the RDKit fingerprint uses LAYERED_FP_SIZE.

Here's a sample with morgan:
In [14]:
DataStructs.FingerprintSimilarity(*[rdMolDescriptors.GetMorganFingerprintAsBitVect(Chem.MolFromSmiles(s),
2,nBits=512) for s in smi])
Out[14]: 0.1797752808988764

and with AtomPairs:
In [15]:
DataStructs.FingerprintSimilarity(*[rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect(Chem.MolFromSmiles(s),
nBits=2048) for s in smi])
Out[15]: 0.32233502538071068

The rest is left as an exercise for the reader. ;-)

Seriously, let us know if you need more info.

-greg
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to