Peter,
  If you have chemfp and can make a chemfp arena, RDKit now supports these
structures for reading and searching.  This, by far, is the fastest way I
know of similarity searching.  I believe that Greg's implementation is
compatible with chemfp 1.0 which is available on pypi:

https://pypi.python.org/pypi/chemfp/1.0

In my copious spare time, I've been trying to think of ways to embed this
directly in a pandas dataframe however, using them side by side is
certainly doable.

Cheers,
 Brian


On Wed, Nov 23, 2016 at 10:06 AM, Peter Gedeck <peter.ged...@gmail.com>
wrote:

> Is it possible to use the bulk similarity searching functionality for
> better performance instead of the list comprehension?
>
> Best,
>
> Peter
>
>
> On Wed, Nov 23, 2016 at 9:11 AM Greg Landrum <greg.land...@gmail.com>
> wrote:
>
> No worries.
> This, and Anna's question about similarity searching and clustering
> illustrate a great opportunity for a tutorial on fingerprints and
> similarity searching.
>
> -greg
>
>
>
>
>
> On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain" <sw...@mac.com>
> wrote:
>
> Thanks for this,
>
> As a chemist who comes from the “cut and paste” school of scripting I’m
> always concerned I’m asking something blindingly obvious
>
> ;-)
>
> Chris
>
> On 23 Nov 2016, at 12:36, Greg Landrum <greg.land...@gmail.com> wrote:
>
> [including rdkit-discuss, because it's relevant there and I'm pretty sure
> Chris won't mind and the real Pandas experts may have a better answer than
> me.]
>
> On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com> wrote:
>
>
> I quite like storing molecules and associated data in a data frame and
> I’ve see that it is possible to use rdkit for substructure searching, it is
> possible to also do similarity searching?
>
>
> It's not built in since there are many possible fingerprints that could be
> used.
>
> It's not quite as convenient as the substructure search, but here's a
> little demo of what you can do to filter based on similarity:
>
> # Start by adding a fingerprint column:
> In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2)
> for x in df['ROMol']]
>
> # and now filter:
> In [21]: ndf =df[df.apply(lambda x: DataStructs.
> TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]
>
> In [23]: len(df)
> Out[23]: 1000
> In [24]: len(ndf)
> Out[24]: 2
>
> -greg
>
>
> ------------------------------------------------------------
> ------------------
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------------------------------------
> ------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to