I need to do it on an indefinite number of decompositions of some molecule, expressed in the form of 1 SMILES + n SMILES fragments (starred SMILES). For example: CCC can be decomposed in C* C *C. And i want to highlight the SMILES in blu and the 2 fragments in yellow
Ivan put me on a good path: m=MolFromSmiles('CCC') s = MolFromSmarts('[$(*-C)].[$(*-C-*)].[$(*-C)]') m.GetSubstructMatches(s) ((0, 2, 1),) s = MolFromSmarts('[$(*-C)].[$(*-C)].[$(*-C-*)]') m.GetSubstructMatches(s) ((0, 1, 2),) if I change the order of the smarts query, the result changes... so maybe there is an order? But i cannot understand it. In CCC the atoms indexes are 0, 1, 2, so it is not the obvious order of the smarts. If there is an order in the GetSubstructMatches() result and I can understand it, I can solve my problem. Another approach would be to express the context in the recursive smarts in an "exclusive" way. For example: [$(C-*)] actually matches ALL atoms of CCC, that means it matches C* but also *C*. How can I express "those bonds, AND ONLY those"? (excluding H, of course) It always amazes me to see how an obvious thing can be so not obvious... kind of captcha thing. I seriously hope to sort this out with smarts without any graph approach... Thomas Il giorno ven 5 mar 2021 alle ore 18:08 Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> ha scritto: > Hi Thomas, > > I believe what you want can be done using recursive SMARTS and > disconnected SMARTS. For example, > > In [7]: mol = Chem.MolFromSmiles('CCC=C') > > In [8]: > mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]')) > > Out[8]: ((0, 1, 2, 3),) > > The recursive SMARTS let you match a single atom, but specifying its > context. [$(C=*)] means match any atom, as long as it's a carbon with a > double bond to any other atom. Importantly, the "any other atom" is not > "consumed", so it can still be matched elsewhere in the SMARTS. > > The SMARTS above won't guarantee that there are no gaps, but you could > independently check that the number of atoms in the molecule equals the > number of atoms in the SMARTS. > > Hope this helps, > Ivan > > > > On Fri, Mar 5, 2021 at 7:36 AM Thomas <odioidenti...@gmail.com> wrote: > >> Is it possible to search for a fragment that is not a valid structure >> itself, but part of a structure? >> >> Problem: "Given a structure, and a decomposition of the structure, >> highlight each part with a different color" >> The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS >> The "smiles fragments" are noted with an asterisk in the "connection >> bonds". >> >> For example: >> mol: CCC=C >> decomposition: C* CC *=C >> >> For a human it takes nothing to spot "who is who", but how would you >> approach it? >> >> - I cannot match the SMARTS "C=": it's not a valid SMARTS >> - I cannot match it without the broken bonds: I would lose the difference >> between C* and C=* >> - I cannot match it like it is: the asterisks will match the first atom >> of the other fragment. (Maybe is there a way to get which part matched with >> who? In that case I could remove the atom matching the asterisk...) >> >> Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but >> the daylight manual is not clear about it. Or maybe I'm just too lazy to >> get it.... >> >> In other words: is it possible to write n SMARTS that together match the >> whole structure (all the atoms and all the bonds, with no overlapping and >> no gaps)? Because if the SMARTS must be a complete structure (without >> "unbonded" bonds), that's actually not possible. >> Thank you >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss