Thanks for this, As a chemist who comes from the “cut and paste” school of scripting I’m always concerned I’m asking something blindingly obvious
;-) Chris > On 23 Nov 2016, at 12:36, Greg Landrum <greg.land...@gmail.com> wrote: > > [including rdkit-discuss, because it's relevant there and I'm pretty sure > Chris won't mind and the real Pandas experts may have a better answer than > me.] > > On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com > <mailto:sw...@mac.com>> wrote: > > I quite like storing molecules and associated data in a data frame and I’ve > see that it is possible to use rdkit for substructure searching, it is > possible to also do similarity searching? > > It's not built in since there are many possible fingerprints that could be > used. > > It's not quite as convenient as the substructure search, but here's a little > demo of what you can do to filter based on similarity: > > # Start by adding a fingerprint column: > In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2) > for x in df['ROMol']] > > # and now filter: > In [21]: ndf =df[df.apply(lambda x: > DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)] > > In [23]: len(df) > Out[23]: 1000 > In [24]: len(ndf) > Out[24]: 2 > > -greg >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss