On 10/19/10 9:58 PM, Geoffrey Hutchison wrote: >> and there's a comment from ghutchis last year saying he should have some >> code that does this fairly soon, so I guess this question is directed at >> Geoff: Is this something that's stayed out because it's low priority or >> because it turns out to be more difficult than anticipated? >> >> If someone can point me in the right direction (and it doesn't >> involve major re-writing of the smarts parser) I'd be happy to >> try and get this working... > > It's a lot more work than anticipated. (I also opened my big mouth > recently about fixing cis/trans matching for SMARTS.) For one, the code > for *parsing* the SMARTS is tricky to modify. I have also seen these > projects have also been lower priority than fixing stereo errors, > crashes, etc. > > For parsing, there's currently a block on '.' but once you remove that, > you'd want to handle it something like an AND operator. (Sorry, it's > late, so I can't remember operator precedence on '.' versus the ANDs in > SMARTS.) > > Once you add '.' as another type of operator, my approach was something like > this: > 1) Ensure that each component matched the OBMol as a SMARTS itself (i.e., > throw away easy rejects) > 2) Ensure that there are at least as many contiguous fragments in the > OBMol as there are components in the SMARTS (more rejects). > 3) For each component SMARTS, go through the matches and determine which > contiguous fragment it's in. > 4) Now the hard work -- iterating through the components and their matches: > * OK, now you have a list of component SMARTS, each with a list of matching > contiguous fragments: > * for component one, take a matching fragment. > * check component two, see if there's a match that's *not* the same as > component one > * If no, reject early > * If yes, continue down the line to make sure we can find unique fragments > for each SMARTS component > > I'm sure that's the naive, brute-force method, but it should work. I'd be > more than happy to give further pointers and/or hand-holding.
I think there's a much simpler and probably faster approach. You just need to make it an atom property. If you detect that a SMARTS contains groups: 1. Number the groups, like 1..N and note which group each atom of the SMARTS is in. This becomes just another atom property, like atomic number. 2. Find how many disconnected components the molecule has, and number them 1 to N. Assign each atom a property, its group number. (This is fast, it can be done in O(N) time where N is the number of atoms.) 3. Create a map of SMARTS-group to molecule-group, which is initially empty. 4. When you match a SMARTS atomic expresssion, check the map. a. If that SMARTS atom's group isn't in the map yet, add an entry that associates the SMARTS group with the molecule's group. b. If the SMARTS atoms's group is in the map, it must map to the molecule atom's group, or else the match fails. 5. If a match fails and you backtrack, clean out the map entries that you created when an atom was matched. This way, you haven't altered the fundamental graph isomorphism algorithm. The grouping becomes just another atomic property. It's just a bit peculiar because the property is dynamic, that is, it depends on what's already been matched. Craig ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel