Hi Lewis, The standard shape alignment algorithm that everyone uses is from Grant & Pickup 1996 (https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291096-987X%2819961115%2917%3A14%3C1653%3A%3AAID-JCC7%3E3.0.CO%3B2-K).
It’s a Taylor-series-like expansion using spherical Gaussians as stand-ins for hard spheres - you take the atomic volumes, subtract off the pairwise overlaps, add back in the three-way overlaps, subtract off the four-way overlaps, and so on. I did a fair few tests some years back and you really need to go to 6 terms to get decent accuracy. However, all of the commercial algorithms (ROCS, Phase Shape, etc) seem to truncate at 2, so go figure. OTOH the “high throughput” versions all seem to be operated with ludicrously low number of conformations so the error in incomplete coverage of conformer space dwarfs the 5% noise that you get from truncating at 2 terms rather than 6. If you want something slightly more accurate at the same computational cost, look at WEGA (https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.23603 and references therein) which heuristically corrects for some flaws in the truncated Grant&Pickup calculations. If you want a fast GPU-accelerated version, then forget about actually applying the algorithm directly[*]. Instead, to compare a reference molecule A to a database molecule B, precompute a grid over A containing the pairwise overlap value of an atom at each point in the grid with A. You can then compute the shape overlap for a given orientation of B by a simple 3D texture lookup rather than faffing around trying to compute exponential functions.. This is simplified by assuming that all atoms have the same atomic radius and neglecting hydrogens (we’re going for speed over accuracy here, remember?) You can get a similar lookup texture for gradients, I think. One thing GPUs are really good at is texture lookups and interpolation. They’re less good at evaluating exponential functions. Your GPU algorithm is then a massively parallel CG or NR optimiser with the objective function computing shape overlap values for as many molecules as you can cram into GPU memory all in parallel. [*] gWEGA (I believe) is a GPU-accelerated version of the standard WEGA algorithm and based on the published timings is an order of magnitude or more slower than fastROCS Having said all of that, our GPU-accelerated shape similarity function just brute forces through the overlap series to sixth order, as (a) my happy place is on the accuracy side of the speed/accuracy tradeoff, and (b) our electrostatic similarity calculations are sufficiently complex that making the shape function faster wouldn’t be that much of a net win. As a result, take all of the above with a grain of salt 😊. Regards, Mark -- Mark Mackey Chief Scientific Officer Cresset New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 853667 email: m...@cresset-group.com<mailto:m...@cresset-group.com> web: www.cresset-group.com<http://www.cresset-group.com/> skype: mark_cresset From: Lewis Martin <lewis.marti...@gmail.com> Sent: 03 November 2020 19:27 To: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] GPU Implementation of shape-based 3D overlap on rdkit? Ive had an initial go at something like this using JAX. I chose JAX since it has a shallow learning curve, essentially being numpy on a GPU. This is great for vectorized calculations, but less so for applications that involve a lot of control flow (ie if/else statements), which as i understand it most point cloud registration algorithms use, such as iterative closest point or anything available in open3d. No guarantee ill make any progress of course, but would someone mind recommending a paper explaining a nice subshape alignment algorithm? Thanks :) Lewis On Wed, 4 Nov 2020 at 3:52 am, Andy Jennings <andy.j.jenni...@gmail.com<mailto:andy.j.jenni...@gmail.com>> wrote: Hi Greg, Thanks for the response and background. Here's hoping someone is smart enough to code this up and generous enough to donate it back to the community. Best, Andy On Mon, Nov 2, 2020 at 8:52 PM Greg Landrum <greg.land...@gmail.com<mailto:greg.land...@gmail.com>> wrote: Hi Andy, At the moment the RDKit doesn't have either high-quality shape-based alignment code[1] or GPU support. I think having good shape-based alignment available would be a really useful complement to the Open3DAlign code that's already there, but it's certainly not a small project. -greg [1] The python implementation of the subshape alignment algorithm is essentially just a proof-of-concept and not performant enough for real usage. On Mon, Nov 2, 2020 at 7:16 PM Andy Jennings <andy.j.jenni...@gmail.com<mailto:andy.j.jenni...@gmail.com>> wrote: Hi, I see that back in 2014 there was some discussion of using CUDA inside of RDKit and how it may be possible to produce a FastROCS-like open source alternative. I was curious if anyone had made such a breakthrough. Since GPU availability is now so common, and datasets are becoming so large, I figured that more and more people would be thinking RDKit + GPU = :-) Thanks in advance. Andy _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Sent from Gmail Mobile
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss