The next quarter release is coming soon, which was good incentive for me to work on the new MCS code for RDKit.
I checked in the actual MCS code a few weeks ago, and Greg's reviewed it at least lightly. This evening I checked in the test cases. They are at: http://sourceforge.net/p/rdkit/code/2197/tree/trunk/rdkit/Chem/MCS.py http://sourceforge.net/p/rdkit/code/2197/tree/trunk/rdkit/Chem/UnitTestMCS.py Here's the API and documentation. def FindMCS(mols, min_num_atoms=2, maximize = Default.maximize, atom_compare = Default.atom_compare, bond_compare = Default.bond_compare, match_valences = Default.match_valences, ring_matches_ring_only = False, complete_rings_only = False, timeout=Default.timeout, ): """Find the maximum common substructure of a set of molecules @type mols: molecule iterator @param mols: find the MCS of these molecules @type min_num_atoms: integer @param min_num_atoms: The minimum number of atoms which must be in the MCS. The minimim value is 2. @type maximize: "atoms" or "bonds" @param maximize: The default "atoms" maximizes the number of atoms in the MCS. Use "bonds" to maximize the number of bonds instead. @type atom_compare: "any", "elements", or "isotopes" @param atom_compare: Specify the atom comparison function. The default "elements" says that two atoms are the same if and only if they have the same element number. Use "isotopes" if you are using isotope labels to define your own atom classs. With "any", all atoms match each other. @type bond_compare: "any" or "bondtypes" @param bond_compare: Specify the bond comparison function. The default "bondtypes" says that two bonds are the same if and only if they have the same bond type. With "any", all bonds match each other. @type match_valences: boolean @param match_valences: If True, atoms must also have matching valences to match. By default this is False. @type ring_matches_ring_only: boolean @param ring_matches_ring_only: If True, then both bonds must either be in a ring or not in a ring in order to match. By default this is False. @type ring_matches_ring_only: boolean @param ring_matches_ring_only: If True, then both bonds must either be in a ring or not in a ring in order to match. By default this is False. @type complete_rings_only: boolean @param complete_rings_only: If True, then if a ring bond of a molecule is in the MCS then the corresponding MCS bond is also in a ring. @type timeout: float @param timeout: stop search after 'timeout' seconds and report the current best MCS. @rtype: MCSResult @return: Information about the MCS search results. Attributes are 'completed' (0 if timeout reached, otherwise 1), 'num_atoms', 'num_bonds', and 'smarts'. Is that too unreadable? I can rewrite that to be more prose than this sort of auto-documentation format. I wasn't sure of how/where to update the documentation. Greg? Do you have an idea of where it might go? Or perhaps want to do it yourself? In other news, I'll be presenting the MCS work at the Goslar conference this fall, and I've got some funding to work on finding a threshold MCS, that is, the maximum common substructure which is in at least some percentage threshold of the entire system. Can anyone suggest a good name for that? Threshold MCS? Something else? Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel