Hi Gurus,
I'm absolutely new to Chem-informatics domain. I've been assigned a PoC where 
I've to compare RDKit in Python and RDKit on PostgreSQL. I've installed both 
and am trying some hands-on exercises to understand the differences. What I've 
understood that the structure searches are slower in Python (Spark Cluster) 
than in PostgreSQL database. Please correct me if I'm wrong as I'm a newbie in 
this and maybe talking silly.
The similarity search using the below functions (example) -Python methods -
fps = FingerprintMols.FingerprintMol(Chem.MolFromSmiles(smile_structure, 
sanitize=False))similarity = DataStructs.TanimotoSimilarity(fps1,fps2)
takes too long (45 minutes) for a 2 million file while the same thing is very 
quick (in seconds) on PostgreSQL Database functions -
select count(*) from (select 
modality_id,m,tanimoto_sml(morganbv_fp(mol_from_smiles('CCOC(=O)c1cc2cc(ccc2[nH]1)C(=O)O'::cstring)),mfp2)
 as similarity from fingerprints join mols using (modality_id)) as fps where 
similarity between 0.45 and 0.50;
Does this conclude that for production workloads one must always use database 
cartridge only? Like RDKit, BINGO, etc.?
Regards,DA
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to