Dear Andrew, thanks a lot for the quick hack and sorry for my late answer! I'm still interested in that issue, but I had no access to coding facilities for almost two days, what a shame!
> > The current hooks in the MCS code don't make that possible. If you > tweak the code a bit, I think you can get something working. > > % python hacked_fmcs.py sample_files/benzotriazole.sdf --maximize all-matches > [#6]:1:[#6]:[#6]:[#6]:2:[#7]:[#7]:[#7]:[#6]:2:[#6]:1 9 atoms 10 > bonds (complete search) > == Successful matches == > [#7]:[#7] > [#6]:[#6] > [#6]:[#7] > [#6]:[#6]:[#7] > [#7]:[#7]:[#7] > [#6]:[#6]:[#6] > [#6]:[#7]:[#7] > [#7]:[#7]:[#6] > ... > [#6]:1:[#6]:[#6]:[#6]:2:[#7]:[#7]:[#7]:[#6]:2:[#6]:1 > [#6]:1:[#6]:[#6]:[#6](:[#6](:[#6]:1):[#7]:[#7]):[#7] > [#6]:1:[#6]:[#6]:[#6](:[#6](:[#6]:1):[#7]):[#7]:[#7] > > I've attached both a patch against the current version of fmcs.py > plus the full, hacked modification. It adds a new '.matches' field > to the MCS result. The key is the SMARTS pattern tested, the value > is if the pattern was in enough of the molecules. (In a non-hacked > solution I would likely return only the valid matches.) > I tried your hacked script on the example you described below as well as on a this small data set: CCNCc1ccccc1 c1ccc(cc1)CN CCNCc1ccccc1 c1cnc[nH] Regarding the issues you mentioned - incomplete, untested & awkward naming - rather slow speed - non-canonical SMARTS - duplicates are not filtered out => these restrictions are OK for me! > > In addition, I would like to output the frequency of the found > > substructures > > That's much harder from fmcs. Then again, I don't know what you mean > by frequency. > > Consider the compounds CCOCC CCNCC CCPCCSCC > > The MCS is "CC". The "CC" exists twice (uniquely) in the first > structure, twice in the second, and three times in the third. > > Is the frequency of "CC" 2 or 3? > > Or do you mean the number of molecules which contain that structure, > in which case "CC" exists in 3 of the structures. That's exactly what I meant. > > > In either case, I don't think the right solution is to do this in > fmcs. You have the SMARTS patterns and the molecules, so do a SMARTS > match yourself and get the exact statistics you want. Following up on that topic - and also taking into account the comments by Peter and Christos - it become clearer to me that fmcs might be not appropriate for that kind of question. You may be interested how such a question came by my mind: Table 6 of this publication sparked my interest: http://dx.doi.org/10.1002/cmdc.201000024 >From a conversation with Andreas Bender, it was explained to me that they did a brute-force all-against-all MCS search in pipeline pilot. It's a miracle for me how to code something similar in RDKit. Following up on the example from the above section: " CCNCc1ccccc1 c1ccc(cc1)CN CCNCc1ccccc1 c1cnc[nH] " => I would be more than happy to finally have this output: " newflavorOfMCS frequency ########################## c1ccc(cc1)CN 3 c1cnc[nH] 1 " Cheers & Thanks for all your input so far! Paul > > Cheers, > > Andrew > [email protected] This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer. ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

