On Sep 1, 2012, at 6:00 PM, James Swetnam wrote: > Sorry, hit send prematurely. > > I'd define a separate function FindMCSWithTimeout for the timeout parameter > unless a majority of users will be concerned with this method's running time, > but I'm not familiar with the algorithm's details.
The MCS algorithm is one of those normally fast but not infrequently exponential slow algorithms. In my various tests, including simple pairwise tests of 10,000 randomly chosen pairs of structures from ChEMBL 13, I found that: 7 pairs took longer than 30 seconds, 9977 took less than 2 seconds, 9958 took less than 1 second, 9420 took less than 0.1 seconds, 4820 took less than 0.01 seconds. 1513 took less than 0.001 seconds As I recall, sometimes the search can take many hours. My belief is that the slow runtimes occur often enough that everyone will use a timeout. I also believe that early on, people who use the API will test with structures where the MCS is easy to find. Only later on will they get stuck in some slow-downs (which is especially a problem on a web server), and then have to diagnose it. BTW, unlike, say, the CDK code, timeout failures do NOT raise an exception. One of the advantages of my algorithm over some of the other ones is that I can report the best size to date. The returned MCSResult contains: - number of atoms in the MCS - number of bonds in the MCS - SMARTS pattern - a 'completed' flag, to know if MCS search completed (True) or stopped due to some other reason, like a timeout (False). Indigo's scaffold finder uses a per-indigo-instance variable to set the timeout. My testing showed that Indigo's code, while faster than fmcs, still has timeout issues. CDK's MCS code implements a timeout. Small Molecule Subgraph Detector (SMSD) does not have a timeout. I had to write my own wrapper around it to kill it when it took too long. OEChem's MCS code does not seem to have a timeout. They have two different methods, one approximate and the other exact. I don't have the experience to judge if there are problems when using the exact method in practice. The existence of timeouts in other codes suggests that it's important, and it's hard to remove the need. // This is OpenBabel like. It's orthogonal to the method itself. I don't know what you mean by orthogonal. That is, why does the run-time allocation any more orthogonal than the binary options ring_matches_ring_only = False, complete_rings_only = False, There are several ways to handle all of the parameters: - have a function which takes all of the parameters - create a search class, where some parameters can be set in the constructor. - and/or, which has methods methods like "SetTimeout(double seconds)" - create a "search parameters" class, which holds all of the values and is passed to the search function - create different classes, one for each parameterization - like Indigo, support multiple toolkit instance, where each instance can have its own settings - introduce global (or per-thread) variables Of these, I thought that a single function call interface was the easiest to understand. Also, I lean towards the functional programming ideal of not changing state, so don't like "SetTimeout" or the approaches which set some sort of (semi-) global parameter. > Agree with Greg's assertion to rename to FindMCS. Yes, I'll be changing the name. > // Rename Default to something like MCSParams, and have a factory > function returning a DefaultMCSParams I assume this is the command pattern? It's not a command pattern. It's minimal data only. ================= ### A place to set global options # (Is this really useful?) class Default(object): timeout = None timeout_string = "none" maximize = "bonds" atom_compare = "elements" bond_compare = "bondtypes" match_valences = False ring_matches_ring_only = False complete_rings_only = False ================== > // Logging should really be separated from API methods. You should be > using the python logging framework such that users > // can modify these globally and not through the API Then perhaps "logging" is the wrong name for it. "Progress" is much better. It's used in the command-line tool to show progress information. When something takes 30 seconds to run, it's comforting to know that the underlying code is doing something. This would also be true for a GUI display. Ideally the API would support a progress handler, along with an option to abort. I think it's best to strip it out of the RDKit version of the code. What's there is only for the command-line tool, and I don't have the time/funding to develop a proper API. BTW, I don't think Python's logging system is appropriate for this. Actually, I don't like using logs for most things, but that's a different topic. > verbose_delay=1.0, > // Also remove from this function Yes. See above. This is an example of something which is hard to integrate with the logging API. I could log every event, but that's useless. Instead, I need to provide some sample. I could do it every N events, but that's CPU- and load-specific. So I chose to use a time delay. I don't know how to configure logging to handle that case. On Fri, Aug 31, 2012 at 8:37 PM, Greg Landrum <greg.land...@gmail.com> wrote: > I would have no objection to the command-line driver being there in a > separate file in the package; it could well be useful to people. It could well be. However: 1) this for me is more of a contractual obligation, in that my client asked me nicely (it's not actually in the contract) to put it into RDKit. I don't have much free time/funding to do this. 2) The command-line tool was designed for my client, and while I think it's general enough, I haven't gotten any feedback about it. The isotope-label-as-atom-classes hack is my biggest concern here. 3) As you all can see, the API is designed specifically for the command-line tool. Stripping out the currently named 'logging' code would mean also modifying the command-line driver. That's more work. Especially since the work I would like to do is to improve the algorithm. 4) I would also have to get a sense of what RDKit's command-line parameter style looks like, so that the code fits in. Hence, rather more work than I would like to take on when so far neither Greg nor my clients have asked for it. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel