Thanks for this,

As a chemist who comes from the “cut and paste” school of scripting I’m always 
concerned I’m asking something blindingly obvious

;-)

Chris
> On 23 Nov 2016, at 12:36, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> [including rdkit-discuss, because it's relevant there and I'm pretty sure 
> Chris won't mind and the real Pandas experts may have a better answer than 
> me.]
> 
> On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com 
> <mailto:sw...@mac.com>> wrote:
> 
> I quite like storing molecules and associated data in a data frame and I’ve 
> see that it is possible to use rdkit for substructure searching, it is 
> possible to also do similarity searching?
> 
> It's not built in since there are many possible fingerprints that could be 
> used.
> 
> It's not quite as convenient as the substructure search, but here's a little 
> demo of what you can do to filter based on similarity:
> 
> # Start by adding a fingerprint column:
> In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2) 
> for x in df['ROMol']]
> 
> # and now filter:
> In [21]: ndf =df[df.apply(lambda x: 
> DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]
> 
> In [23]: len(df)
> Out[23]: 1000
> In [24]: len(ndf)
> Out[24]: 2
> 
> -greg
> 

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to