Re: [Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)
I need to do it on an indefinite number of decompositions of some molecule, expressed in the form of 1 SMILES + n SMILES fragments (starred SMILES). For example: CCC can be decomposed in C* C *C. And i want to highlight the SMILES in blu and the 2 fragments in yellow Ivan put me on a good path: m=MolFromSmiles('CCC') s = MolFromSmarts('[$(*-C)].[$(*-C-*)].[$(*-C)]') m.GetSubstructMatches(s) ((0, 2, 1),) s = MolFromSmarts('[$(*-C)].[$(*-C)].[$(*-C-*)]') m.GetSubstructMatches(s) ((0, 1, 2),) if I change the order of the smarts query, the result changes... so maybe there is an order? But i cannot understand it. In CCC the atoms indexes are 0, 1, 2, so it is not the obvious order of the smarts. If there is an order in the GetSubstructMatches() result and I can understand it, I can solve my problem. Another approach would be to express the context in the recursive smarts in an "exclusive" way. For example: [$(C-*)] actually matches ALL atoms of CCC, that means it matches C* but also *C*. How can I express "those bonds, AND ONLY those"? (excluding H, of course) It always amazes me to see how an obvious thing can be so not obvious... kind of captcha thing. I seriously hope to sort this out with smarts without any graph approach... Thomas Il giorno ven 5 mar 2021 alle ore 18:08 Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> ha scritto: > Hi Thomas, > > I believe what you want can be done using recursive SMARTS and > disconnected SMARTS. For example, > > In [7]: mol = Chem.MolFromSmiles('CCC=C') > > In [8]: > mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]')) > > Out[8]: ((0, 1, 2, 3),) > > The recursive SMARTS let you match a single atom, but specifying its > context. [$(C=*)] means match any atom, as long as it's a carbon with a > double bond to any other atom. Importantly, the "any other atom" is not > "consumed", so it can still be matched elsewhere in the SMARTS. > > The SMARTS above won't guarantee that there are no gaps, but you could > independently check that the number of atoms in the molecule equals the > number of atoms in the SMARTS. > > Hope this helps, > Ivan > > > > On Fri, Mar 5, 2021 at 7:36 AM Thomas wrote: > >> Is it possible to search for a fragment that is not a valid structure >> itself, but part of a structure? >> >> Problem: "Given a structure, and a decomposition of the structure, >> highlight each part with a different color" >> The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS >> The "smiles fragments" are noted with an asterisk in the "connection >> bonds". >> >> For example: >> mol: CCC=C >> decomposition: C* CC*=C >> >> For a human it takes nothing to spot "who is who", but how would you >> approach it? >> >> - I cannot match the SMARTS "C=": it's not a valid SMARTS >> - I cannot match it without the broken bonds: I would lose the difference >> between C* and C=* >> - I cannot match it like it is: the asterisks will match the first atom >> of the other fragment. (Maybe is there a way to get which part matched with >> who? In that case I could remove the atom matching the asterisk...) >> >> Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but >> the daylight manual is not clear about it. Or maybe I'm just too lazy to >> get it >> >> In other words: is it possible to write n SMARTS that together match the >> whole structure (all the atoms and all the bonds, with no overlapping and >> no gaps)? Because if the SMARTS must be a complete structure (without >> "unbonded" bonds), that's actually not possible. >> Thank you >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)
Hi Thomas, I believe what you want can be done using recursive SMARTS and disconnected SMARTS. For example, In [7]: mol = Chem.MolFromSmiles('CCC=C') In [8]: mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]')) Out[8]: ((0, 1, 2, 3),) The recursive SMARTS let you match a single atom, but specifying its context. [$(C=*)] means match any atom, as long as it's a carbon with a double bond to any other atom. Importantly, the "any other atom" is not "consumed", so it can still be matched elsewhere in the SMARTS. The SMARTS above won't guarantee that there are no gaps, but you could independently check that the number of atoms in the molecule equals the number of atoms in the SMARTS. Hope this helps, Ivan On Fri, Mar 5, 2021 at 7:36 AM Thomas wrote: > Is it possible to search for a fragment that is not a valid structure > itself, but part of a structure? > > Problem: "Given a structure, and a decomposition of the structure, > highlight each part with a different color" > The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS > The "smiles fragments" are noted with an asterisk in the "connection > bonds". > > For example: > mol: CCC=C > decomposition: C* CC*=C > > For a human it takes nothing to spot "who is who", but how would you > approach it? > > - I cannot match the SMARTS "C=": it's not a valid SMARTS > - I cannot match it without the broken bonds: I would lose the difference > between C* and C=* > - I cannot match it like it is: the asterisks will match the first atom of > the other fragment. (Maybe is there a way to get which part matched with > who? In that case I could remove the atom matching the asterisk...) > > Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but > the daylight manual is not clear about it. Or maybe I'm just too lazy to > get it > > In other words: is it possible to write n SMARTS that together match the > whole structure (all the atoms and all the bonds, with no overlapping and > no gaps)? Because if the SMARTS must be a complete structure (without > "unbonded" bonds), that's actually not possible. > Thank you > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)
Is it possible to search for a fragment that is not a valid structure itself, but part of a structure? Problem: "Given a structure, and a decomposition of the structure, highlight each part with a different color" The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS The "smiles fragments" are noted with an asterisk in the "connection bonds". For example: mol: CCC=C decomposition: C* CC*=C For a human it takes nothing to spot "who is who", but how would you approach it? - I cannot match the SMARTS "C=": it's not a valid SMARTS - I cannot match it without the broken bonds: I would lose the difference between C* and C=* - I cannot match it like it is: the asterisks will match the first atom of the other fragment. (Maybe is there a way to get which part matched with who? In that case I could remove the atom matching the asterisk...) Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but the daylight manual is not clear about it. Or maybe I'm just too lazy to get it In other words: is it possible to write n SMARTS that together match the whole structure (all the atoms and all the bonds, with no overlapping and no gaps)? Because if the SMARTS must be a complete structure (without "unbonded" bonds), that's actually not possible. Thank you ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Drawing options: noCarbonSymbols = False
Thank you, I was looking for more readable images: for example in OpenBabel "the default is to draw all hetero atoms and terminal C explicitly, together with their attched hydrogens." Basically I'm approaching rdkit right now, so I was wondering if there was something wrong in my approach. "Not available" is quite an answer, but this leads me to new questions: 1) What is the "official" way to pass drawing options? 2) Where can I find an updated doc about the available drawing options? ( https://www.rdkit.org/docs/source/rdkit.Chem.Draw.MolDrawing.html this apparently is outdated, since the "noCarbonSymbol" is still there) It's just that when the doc it's outdated it can be frustrating: I spent hours trying to understand what I was doing wrong. For example: let's say I want all atoms and bonds black. I found a snippet in your cookbook that requires importing ipython for that: is it really necessary? Thank you Il giorno gio 4 mar 2021 alle ore 06:52 Greg Landrum ha scritto: > Hi Thomas, > > noCarbonSymbols was an option for the old drawing code and is not > currently available in the new drawing code since it didn't seem to be > particularly useful. > > We can maybe provide suggestions if you provide an example image that > you're getting and explain what you'd like to change about it. > > Best, > -greg > > > On Tue, Mar 2, 2021 at 8:49 PM Thomas wrote: > >> Hi everybody, >> I'm very new to rdkit, can anybody explain to me how to set drawing >> options? >> I wanted to do something easy like: >> >> MolToImage(mol, noCarbonSymbol=False) >> >> but apparently it is not the right way. Following some recipes i tried >> also with: >> >> drawer = rdMolDraw2D.MolDraw2DCairo(100, 100) >> drawer.drawOptions().noCarbonSymbols = False >> >> but again no luck. I tried also to modify the MolDrawOptions object >> >> options = drawer.drawOptions() >> >> but there is nothing about "noCarbonSymbol"... >> >> Can anybody address me on the right path? Much appreciated >> Thomas >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss