Re: [Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)

2021-03-05 Thread Thomas
I need to do it on an indefinite number of decompositions of some molecule,
expressed in the form of 1 SMILES + n SMILES fragments (starred SMILES).
For example: CCC can be decomposed in C* C *C. And i want to highlight the
SMILES in blu and the 2 fragments in yellow

Ivan put me on a good path:

m=MolFromSmiles('CCC')
s = MolFromSmarts('[$(*-C)].[$(*-C-*)].[$(*-C)]')
m.GetSubstructMatches(s)
((0, 2, 1),)

s = MolFromSmarts('[$(*-C)].[$(*-C)].[$(*-C-*)]')
m.GetSubstructMatches(s)
((0, 1, 2),)

if I change the order of the smarts query, the result changes... so maybe
there is an order?  But i cannot understand it.
In CCC the atoms indexes are 0, 1, 2, so it is not the obvious order of the
smarts. If there is an order in the GetSubstructMatches() result and I can
understand it, I can solve my problem.

Another approach would be to express the context in the recursive smarts in
an "exclusive" way.
For example:
[$(C-*)] actually matches ALL atoms of CCC, that means it matches C* but
also *C*. How can I express "those bonds, AND ONLY those"? (excluding H, of
course)

It always amazes me to see how an obvious thing can be so not obvious...
kind of captcha thing. I seriously hope to sort this out with smarts
without any graph approach...
Thomas

Il giorno ven 5 mar 2021 alle ore 18:08 Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> ha scritto:

> Hi Thomas,
>
> I believe what you want can be done using recursive SMARTS and
> disconnected SMARTS. For example,
>
> In [7]: mol = Chem.MolFromSmiles('CCC=C')
>
> In [8]:
> mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]'))
>
> Out[8]: ((0, 1, 2, 3),)
>
> The recursive SMARTS let you match a single atom, but specifying its
> context. [$(C=*)] means match any atom, as long as it's a carbon with a
> double bond to any other atom. Importantly, the "any other atom" is not
> "consumed", so it can still be matched elsewhere in the SMARTS.
>
> The SMARTS above won't guarantee that there are no gaps, but you could
> independently check that the number of atoms in the molecule equals the
> number of atoms in the SMARTS.
>
> Hope this helps,
> Ivan
>
>
>
> On Fri, Mar 5, 2021 at 7:36 AM Thomas  wrote:
>
>> Is it possible to search for a fragment that is not a valid structure
>> itself, but part of a structure?
>>
>> Problem: "Given a structure, and a decomposition of the structure,
>> highlight each part with a different color"
>> The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS
>> The "smiles fragments" are noted with an asterisk in the "connection
>> bonds".
>>
>> For example:
>> mol: CCC=C
>> decomposition:  C*   CC*=C
>>
>> For a human it takes nothing to spot "who is who", but how would you
>> approach it?
>>
>> - I cannot match the SMARTS "C=": it's not a valid SMARTS
>> - I cannot match it without the broken bonds: I would lose the difference
>> between C* and C=*
>> - I cannot match it like it is: the asterisks will match the first atom
>> of the other fragment. (Maybe is there a way to get which part matched with
>> who? In that case I could remove the atom matching the asterisk...)
>>
>> Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but
>> the daylight manual is not clear about it. Or maybe I'm just too lazy to
>> get it
>>
>> In other words: is it possible to write n SMARTS that together match the
>> whole structure (all the atoms and all the bonds, with no overlapping and
>> no gaps)? Because if the SMARTS must be a complete structure (without
>> "unbonded" bonds), that's actually not possible.
>> Thank you
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)

2021-03-05 Thread Ivan Tubert-Brohman
Hi Thomas,

I believe what you want can be done using recursive SMARTS and disconnected
SMARTS. For example,

In [7]: mol = Chem.MolFromSmiles('CCC=C')

In [8]:
mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]'))

Out[8]: ((0, 1, 2, 3),)

The recursive SMARTS let you match a single atom, but specifying its
context. [$(C=*)] means match any atom, as long as it's a carbon with a
double bond to any other atom. Importantly, the "any other atom" is not
"consumed", so it can still be matched elsewhere in the SMARTS.

The SMARTS above won't guarantee that there are no gaps, but you could
independently check that the number of atoms in the molecule equals the
number of atoms in the SMARTS.

Hope this helps,
Ivan



On Fri, Mar 5, 2021 at 7:36 AM Thomas  wrote:

> Is it possible to search for a fragment that is not a valid structure
> itself, but part of a structure?
>
> Problem: "Given a structure, and a decomposition of the structure,
> highlight each part with a different color"
> The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS
> The "smiles fragments" are noted with an asterisk in the "connection
> bonds".
>
> For example:
> mol: CCC=C
> decomposition:  C*   CC*=C
>
> For a human it takes nothing to spot "who is who", but how would you
> approach it?
>
> - I cannot match the SMARTS "C=": it's not a valid SMARTS
> - I cannot match it without the broken bonds: I would lose the difference
> between C* and C=*
> - I cannot match it like it is: the asterisks will match the first atom of
> the other fragment. (Maybe is there a way to get which part matched with
> who? In that case I could remove the atom matching the asterisk...)
>
> Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but
> the daylight manual is not clear about it. Or maybe I'm just too lazy to
> get it
>
> In other words: is it possible to write n SMARTS that together match the
> whole structure (all the atoms and all the bonds, with no overlapping and
> no gaps)? Because if the SMARTS must be a complete structure (without
> "unbonded" bonds), that's actually not possible.
> Thank you
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)

2021-03-05 Thread Thomas
Is it possible to search for a fragment that is not a valid structure
itself, but part of a structure?

Problem: "Given a structure, and a decomposition of the structure,
highlight each part with a different color"
The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS
The "smiles fragments" are noted with an asterisk in the "connection bonds".

For example:
mol: CCC=C
decomposition:  C*   CC*=C

For a human it takes nothing to spot "who is who", but how would you
approach it?

- I cannot match the SMARTS "C=": it's not a valid SMARTS
- I cannot match it without the broken bonds: I would lose the difference
between C* and C=*
- I cannot match it like it is: the asterisks will match the first atom of
the other fragment. (Maybe is there a way to get which part matched with
who? In that case I could remove the atom matching the asterisk...)

Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but
the daylight manual is not clear about it. Or maybe I'm just too lazy to
get it

In other words: is it possible to write n SMARTS that together match the
whole structure (all the atoms and all the bonds, with no overlapping and
no gaps)? Because if the SMARTS must be a complete structure (without
"unbonded" bonds), that's actually not possible.
Thank you
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Drawing options: noCarbonSymbols = False

2021-03-05 Thread Thomas
Thank you, I was looking for more readable images: for example in OpenBabel
"the default is to draw all hetero atoms and terminal C explicitly,
together with their attched hydrogens."
Basically I'm approaching rdkit right now, so I was wondering if there was
something wrong in my approach. "Not available" is quite an answer, but
this leads me to new questions:

1) What is the "official" way to pass drawing options?
2) Where can I find an updated doc about the available drawing options? (
https://www.rdkit.org/docs/source/rdkit.Chem.Draw.MolDrawing.html this
apparently is outdated, since the "noCarbonSymbol" is still there)

It's just that when the doc it's outdated it can be frustrating: I spent
hours trying to understand what I was doing wrong.
For example: let's say I want all atoms and bonds black. I found a snippet
in your cookbook that requires importing ipython for that: is it really
necessary?
Thank you

Il giorno gio 4 mar 2021 alle ore 06:52 Greg Landrum 
ha scritto:

> Hi Thomas,
>
> noCarbonSymbols was an option for the old drawing code and is not
> currently available in the new drawing code since it didn't seem to be
> particularly useful.
>
> We can maybe provide suggestions if you provide an example image that
> you're getting and explain what you'd like to change about it.
>
> Best,
> -greg
>
>
> On Tue, Mar 2, 2021 at 8:49 PM Thomas  wrote:
>
>> Hi everybody,
>>  I'm very new to rdkit, can anybody explain to me how to set drawing
>> options?
>> I wanted to do something easy like:
>>
>> MolToImage(mol, noCarbonSymbol=False)
>>
>> but apparently it is not the right way. Following some recipes i tried
>> also with:
>>
>> drawer = rdMolDraw2D.MolDraw2DCairo(100, 100)
>> drawer.drawOptions().noCarbonSymbols = False
>>
>> but again no luck. I tried also to modify the MolDrawOptions object
>>
>> options = drawer.drawOptions()
>>
>> but there is nothing about "noCarbonSymbol"...
>>
>> Can anybody address me on the right path? Much appreciated
>> Thomas
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss