Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-26 Thread David Cosgrove
Thanks for the reference. That sort of bounds screening would probably work well in the C++ layer for the bulk similarity functions. My initial experiments without bounds screening found that doing individual similarity calculations in Python was a lot slower than the bulk function because moving

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-26 Thread David Cosgrove
I would be very surprised if speed of fingerprint similarity was the limiting factor on a distance- matrix-based clustering method. Normally they are constrained by memory requirements. In this case I am using the MaxMin picker in RDKit to generate the cluster “centroids” and am wanting to fill

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-25 Thread S Joshua Swamidass
I wonder if there is a way to make use of PyTorch or tensorflow to do this on a GPU. That’s where some big speed ups might be found. Also, consider using these bounds. They do make a big difference in many cases. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527184/ On Tue, Oct 25, 2022 at

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-25 Thread Francois Berenger
On 24/10/2022 19:47, David Cosgrove wrote: For the record, I have attempted this, but got only a marginal speed-up (130% of CPU used, with any number of threads above 2). The procedure I used was to extract the fingerprint pointers into a std::vector, create a std::vector for the results,

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-24 Thread David Cosgrove
For the record, I have attempted this, but got only a marginal speed-up (130% of CPU used, with any number of threads above 2). The procedure I used was to extract the fingerprint pointers into a std::vector, create a std::vector for the results, unlock the GIL to do the bulk tanimoto

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-22 Thread David Cosgrove
Hi Greg, Thanks for the pointer. I’ll take a look. If it could go in the next patch release that would be really useful. Dave On Sat, 22 Oct 2022 at 10:52, Greg Landrum wrote: > > Hi Dave, > > We have multiple examples of this in the code, here’s one: > >

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-22 Thread Greg Landrum
Hi Dave, We have multiple examples of this in the code, here’s one: https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40 I’m not sure how this would interact with the call to Python::extract that’s in the bulk

[Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-22 Thread David Cosgrove
Hi, I'm doing a lot of tanimoto similarity calculations on large datasets using BulkTanimotoSimilarity. It is an obvious candidate for parallelisation, so I am using concurrent.futures to do so. If I use ProcessPoolExectuor, I get good speed-up but each process needs a copy of the fingerprint