Hello

> given a data set of let's say 2000 compounds,
>  how do I extract the most
> common substructures rather than the
> maximum common substructures?
> In addition, I would like to output the
> frequency of the found

One approach would be to take a brics decomposition where you keep the full
decomposition hierarchy of a structure. You can then just count the
fragments in your data set to get the frequencies. As the brics decomp is
done in python it's not particularly fast (I mean interactive speed) but
for 2000 compounds it's ok.

The nice thing about the brics fragments is that chemists will like them.
I would terminate the decomposition at a fragment size of 3 to avoid
getting single atoms. check the arguments of the brics decomposition
function for ways to do this.

Best,

Peter
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to