I wonder if there is a way to make use of PyTorch or tensorflow to do this
on a GPU. That’s where some big speed ups might be found.

Also, consider using these bounds. They do make a big difference in many
cases.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527184/


On Tue, Oct 25, 2022 at 8:57 PM Francois Berenger <mli...@ligand.eu> wrote:

> On 24/10/2022 19:47, David Cosgrove wrote:
> > For the record, I have attempted this, but got only a marginal
> > speed-up (130% of CPU used, with any number of threads above 2).  The
> > procedure I used was to extract the fingerprint pointers into a
> > std::vector, create a std::vector for the results, unlock the GIL to
> > do the bulk tanimoto calculation, then re-lock the GIL to copy the
> > results from the std::vector into the python:list for output.  I guess
> > the extra overhead to create and populate the additional std::vectors
> > destroyed any potential speedup.  This was on a vector of 200K
> > fingerprints, which suggests that the Tanimoto calculation is a small
> > part of the overall time.  It doesn't seem worth pursuing further.
>
> There is probably code on github doing this in parallel already.
> Think about it: any clustering algorithm using a distance matrix.
> I guess many people want to initialize the Gram matrix in parallel.
>
> I wouldn't be surprised if, for example, chemfp has such code.
>
> > Dave
> >
> > On Sat, Oct 22, 2022 at 11:28 AM David Cosgrove
> > <davidacosgrov...@gmail.com> wrote:
> >
> >> Hi Greg,
> >> Thanks for the pointer. I’ll take a look. If it could go in the
> >> next patch release that would be really useful.
> >> Dave
> >>
> >> On Sat, 22 Oct 2022 at 10:52, Greg Landrum <greg.land...@gmail.com>
> >> wrote:
> >>
> >> Hi Dave,
> >>
> >> We have multiple examples of this in the code, here’s one:
> >>
> >>
> >
> https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40
> >>
> >> I’m not sure how this would interact with the call to
> >> Python::extract that’s in the bulk functions though
> >>
> >> It might be better to handle the multithreading on the C++ side by
> >> adding an optional nThreads argument to  the bulk similarity
> >> functions. (Though this would have to wait for the next release
> >> since it’s a feature addition… we can declare releasing the GIL
> >> as a bug fix)
> >>
> >> -greg
> >>
> >> On Sat, 22 Oct 2022 at 09:48, David Cosgrove
> >> <davidacosgrov...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm doing a lot of tanimoto similarity calculations on large
> >> datasets using BulkTanimotoSimilarity.  It is an obvious candidate
> >> for parallelisation, so I am using concurrent.futures to do so.  If
> >> I use ProcessPoolExectuor, I get good speed-up but each process
> >> needs a copy of the fingerprint set and for the sizes I'm dealing
> >> with that uses too much memory.  With ThreadPoolExecutor I only need
> >> 1 copy of the fingerprints, but the GIL means it only runs on 1
> >> thread at a time so there's no gain.  Would it be possible to amend
> >> the C++ BulkTanimotoSimilarity to free the GIL whilst it's doing the
> >> calculation, and recapture it afterwards?  I understand things like
> >> numpy do this for some of their functions.  I'm happy to attempt it
> >> myself if someone who knows about these things can advise that it
> >> could be done, it would help, and they could provide a few pointers.
> >>
> >> Thanks,
> >> Dave
> >>
> >> --
> >>
> >> David Cosgrove
> >> Freelance computational chemistry and chemoinformatics developer
> >> http://cozchemix.co.uk
> >>
> >> _______________________________________________
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >  --
> >
> > David Cosgrove
> > Freelance computational chemistry and chemoinformatics developer
> > http://cozchemix.co.uk
> >
> > --
> >
> > David Cosgrove
> > Freelance computational chemistry and chemoinformatics developer
> > http://cozchemix.co.uk
> > _______________________________________________
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
Sent from Gmail Mobile
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to