Dear RDKitters, given a data set of let's say 2000 compounds, how do I extract the most common substructures rather than the maximum common substructures? In addition, I would like to output the frequency of the found substructures
E.g., the output would look like that N(CCc1ccccc1)C, frequency: 12 Nc1ccccc1, frequency: 10 Is such an analysis only possible via a brut-force approach? Or can one do it smarter? Cheers & Thanks, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer. ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

