Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-29 Thread Jan Holst Jensen

Hi Kovas,

Greg has precisely pointed out the major problem of collapsing fragments 
into single atoms: Searching and comparing structures.


With that warning in mind: I use pseudo atoms (e.g. "Ala", "Arg",...) to 
good effect to represent amino acids in peptides and proteins. My 
colleague Esben Bjerrum has done custom builds of RDKit where the 
atomic_data.cpp file was changed to add the 22 natural amino acids.


The rest of RDKit handles the new atoms surprisingly well. The new atoms 
can also be used in SMARTS queries as long as you reference them by 
atomic number (and Greg's caution about searching applies doubly in that 
case).


So, yes, that's one way of doing it. Just don't expect anyone else to be 
able to interpret your molfiles reliably :-).



You write that you want to mask away the macromolecule part since you 
are not going to interact with it. In that case it sounds like it is OK 
to throw away the underlying chemistry of the macromolecule and 
substitute a label for depiction. I would then go with Greg's suggestion 
to use dummy atoms and labels, e.g.


   import rdkit
   from rdkit import Chem
   from rdkit.Chem import Draw

   m = Chem.MolFromSmiles('CC[*:1]')
   # Put a molfile label on the star atom.
   m.GetAtoms()[2].SetProp("molFileAlias", "Macromol-section")

   print(Chem.MolToMolBlock(m))

   PRINT OUTPUT:

 RDKit

  3  2  0  0  0  0  0  0  0  0999 V2000
    0.    0.    0. C   0  0  0  0  0  0  0  0 0  0  0  0
    0.    0.    0. C   0  0  0  0  0  0  0  0 0  0  0  0
    0.    0.    0. R   0  0  0  0  0  1  0  0 0  1  0  0
  1  2  1  0
  2  3  1  0
   A    3
   Macromol-section
   M  END

If you paste that molfile into MarvinSketch you see this (different 
tools will show labels in different ways):




I am very much a molfile guy, so I don't know if labels can be carried 
over to RDKit SMILES strings.


Cheers
-- Jan

On 2017-09-28 08:00, Kovas Palunas wrote:
The way i was thinking about it, the smarts of OCC would not match the 
O[but] because [but] is a totally new atom that is not related to 
carbon at all.  This doesn't really make sense in this example, but it 
does (i think) for most of my purposes (where i want to mask away a 
biological macromolecule that i do not want to interact with).


There are probably still edge cases i'm not seeing... but maybe it's 
still worth a try?  I saw there was a periodic table module in RDKit.  
Is it possible to add these atoms there?


- Kovas


From: Greg Landrum
Sent: Wednesday, September 27, 10:13 PM
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net



I'm afraid that there's likely to be rather a lot of devil hiding in 
the details (as is so often the case).


A simple example of one problem: let's take your [But]O case. Suppose 
you do a substructure search for the molecule defined by the SMARTS 
"OCC". Does that match "[But]O"?  What does it return when I ask for 
the substructure matches (this function, if you aren't familiar with 
it, returns the indices of the matching atoms)? What about the SMARTS 
"CC"?


One solution to this that works with substructure searching is to have 
the molecule contain all the atoms - "O" in your example - but to 
have the four C atoms marked as a group so that drawings of the 
molecule display "[But]O". Supporting this type of functionality is on 
the To Do list (it's part of supporting S Groups from Mol files).


If you just want to indicate that there is a [But] group there but not 
really do anything with the group's structure, there's are probably 
already ways to handle this using dummy atoms and custom labels.


-greg




On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas 
<kovas.palu...@arzeda.com <mailto:kovas.palu...@arzeda.com>> wrote:
Ideally, I'd like to treat these pseudoatoms as similarly to normal 
atoms as possible.  I would mostly want to use them for substructure 
matching, running reactions, and also display purposes.  Also, basic 
atom queries, such as getting a mapping number or a atom symbol.


I was thinking that maybe this could be done by just defining the CoA 
atom type (for example) just as the carbon or oxygen atom types are 
defined (setting atomic weight, valences, etc.).


Does this make sense?

 - Kovas
*From:*Greg Landrum<greg.land...@gmail.com 
<mailto:greg.land...@gmail.com>>

*Sent:*Wednesday, September 27, 2017 2:27:04 AM
*To:*Kovas Palunas
*Cc:*rdkit 
<mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net 
<mailto:rdkit-discuss@lists.sourceforge.net>

*Subject:*Re: [Rdkit-discuss] Masking groups as atoms in RDKit

Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?

-greg


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas 
<ko

Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-28 Thread Greg Landrum
I'm afraid that there's likely to be rather a lot of devil hiding in the
details (as is so often the case).

A simple example of one problem: let's take your [But]O case. Suppose you
do a substructure search for the molecule defined by the SMARTS "OCC". Does
that match "[But]O"?  What does it return when I ask for the substructure
matches (this function, if you aren't familiar with it, returns the indices
of the matching atoms)? What about the SMARTS "CC"?

One solution to this that works with substructure searching is to have the
molecule contain all the atoms - "O" in your example - but to have the
four C atoms marked as a group so that drawings of the molecule display
"[But]O". Supporting this type of functionality is on the To Do list (it's
part of supporting S Groups from Mol files).

If you just want to indicate that there is a [But] group there but not
really do anything with the group's structure, there's are probably already
ways to handle this using dummy atoms and custom labels.

-greg




On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com>
wrote:

> Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms
> as possible.  I would mostly want to use them for substructure matching,
> running reactions, and also display purposes.  Also, basic atom queries,
> such as getting a mapping number or a atom symbol.
>
> I was thinking that maybe this could be done by just defining the CoA atom
> type (for example) just as the carbon or oxygen atom types are defined
> (setting atomic weight, valences, etc.).
>
> Does this make sense?
>
>  - Kovas
> --
> *From:* Greg Landrum <greg.land...@gmail.com>
> *Sent:* Wednesday, September 27, 2017 2:27:04 AM
> *To:* Kovas Palunas
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] Masking groups as atoms in RDKit
>
> Where would you want to use this?
> Is it for depiction (i.e. drawing molecules) or something else?
>
> -greg
>
>
> On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com>
> wrote:
>
>> Hi all,
>>
>>
>> Has anyone tried implementing or using a group to atom masking strategy
>> in RDKit?  By this I mean taking a piece of a molecule and representing it
>> as a single atom.  Here is an example:
>>
>>
>> O  could be represented as  [But]O, where the atom [But] represents
>> the four carbon chain.
>>
>>
>> In my case I'm particularly interested is using this strategy to
>> represent large biological molecules / molecule pieces, such as coenzyme A.
>>
>>
>>
>> If I were to implement this myself, is there a place in RDKit where atom
>> types can be defined?
>>
>>
>> Thanks!
>>
>>
>>  - Kovas
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-28 Thread Kovas Palunas
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as 
possible.  I would mostly want to use them for substructure matching, running 
reactions, and also display purposes.  Also, basic atom queries, such as 
getting a mapping number or a atom symbol.

I was thinking that maybe this could be done by just defining the CoA atom type 
(for example) just as the carbon or oxygen atom types are defined (setting 
atomic weight, valences, etc.).

Does this make sense?

 - Kovas

From: Greg Landrum <greg.land...@gmail.com>
Sent: Wednesday, September 27, 2017 2:27:04 AM
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit

Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?

-greg


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas 
<kovas.palu...@arzeda.com<mailto:kovas.palu...@arzeda.com>> wrote:

Hi all,


Has anyone tried implementing or using a group to atom masking strategy in 
RDKit?  By this I mean taking a piece of a molecule and representing it as a 
single atom.  Here is an example:


O  could be represented as  [But]O, where the atom [But] represents the 
four carbon chain.


In my case I'm particularly interested is using this strategy to represent 
large biological molecules / molecule pieces, such as coenzyme A.


If I were to implement this myself, is there a place in RDKit where atom types 
can be defined?


Thanks!


 - Kovas


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-28 Thread Greg Landrum
There's currently no way to add to the periodic table. I'm somewhat
uncomfortable with the idea (there's a lot that can go wrong), but your use
case isn't that uncommon (using custom atom types to represent amino acids
has come up before), so it's worth thinking about how to do something like
this.

-greg


On Thu, Sep 28, 2017 at 8:00 AM, Kovas Palunas <kovas.palu...@arzeda.com>
wrote:

> The way i was thinking about it, the smarts of OCC would not match the
> O[but] because [but] is a totally new atom that is not related to carbon at
> all.  This doesn't really make sense in this example, but it does (i think)
> for most of my purposes (where i want to mask away a biological
> macromolecule that i do not want to interact with).
>
> There are probably still edge cases i'm not seeing... but maybe it's still
> worth a try?  I saw there was a periodic table module in RDKit.  Is it
> possible to add these atoms there?
>
> - Kovas
>
>
> From: Greg Landrum
> Sent: Wednesday, September 27, 10:13 PM
> Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit
> To: Kovas Palunas
> Cc: rdkit-discuss@lists.sourceforge.net
>
>
>
> I'm afraid that there's likely to be rather a lot of devil hiding in the
> details (as is so often the case).
>
> A simple example of one problem: let's take your [But]O case. Suppose you
> do a substructure search for the molecule defined by the SMARTS "OCC". Does
> that match "[But]O"?  What does it return when I ask for the substructure
> matches (this function, if you aren't familiar with it, returns the indices
> of the matching atoms)? What about the SMARTS "CC"?
>
> One solution to this that works with substructure searching is to have the
> molecule contain all the atoms - "O" in your example - but to have the
> four C atoms marked as a group so that drawings of the molecule display
> "[But]O". Supporting this type of functionality is on the To Do list (it's
> part of supporting S Groups from Mol files).
>
> If you just want to indicate that there is a [But] group there but not
> really do anything with the group's structure, there's are probably already
> ways to handle this using dummy atoms and custom labels.
>
> -greg
>
>
>
>
> On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com>
> wrote:
>
> Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms
> as possible.  I would mostly want to use them for substructure matching,
> running reactions, and also display purposes.  Also, basic atom queries,
> such as getting a mapping number or a atom symbol.
>
> I was thinking that maybe this could be done by just defining the CoA atom
> type (for example) just as the carbon or oxygen atom types are defined
> (setting atomic weight, valences, etc.).
>
> Does this make sense?
>
>  - Kovas
> *From:* Greg Landrum <greg.land...@gmail.com>
> *Sent:* Wednesday, September 27, 2017 2:27:04 AM
> *To:* Kovas Palunas
> *Cc:* rdkit <rdkit-discuss@lists.sourceforge.net>-discuss@lists.
> sourceforge.net <rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Masking groups as atoms in RDKit
>
> Where would you want to use this?
> Is it for depiction (i.e. drawing molecules) or something else?
>
> -greg
>
>
> On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com>
> wrote:
>
> Hi all,
>
> Has anyone tried implementing or using a group to atom masking strategy in
> RDKit?  By this I mean taking a piece of a molecule and representing it as
> a single atom.  Here is an example:
>
> O  could be represented as  [But]O, where the atom [But] represents
> the four carbon chain.
>
> In my case I'm particularly interested is using this strategy to represent
> large biological molecules / molecule pieces, such as coenzyme A.
>
> If I were to implement this myself, is there a place in RDKit where atom
> types can be defined?
>
> Thanks!
>
>  - Kovas
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit <Rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net
> <Rdkit-discuss@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>listinfo
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>/
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>rdki

Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-28 Thread Kovas Palunas
The way i was thinking about it, the smarts of OCC would not match the O[but] 
because [but] is a totally new atom that is not related to carbon at all.  This 
doesn't really make sense in this example, but it does (i think) for most of my 
purposes (where i want to mask away a biological macromolecule that i do not 
want to interact with).

There are probably still edge cases i'm not seeing... but maybe it's still 
worth a try?  I saw there was a periodic table module in RDKit.  Is it possible 
to add these atoms there?

- Kovas


From: Greg Landrum
Sent: Wednesday, September 27, 10:13 PM
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net



I'm afraid that there's likely to be rather a lot of devil hiding in the 
details (as is so often the case).

A simple example of one problem: let's take your [But]O case. Suppose you do a 
substructure search for the molecule defined by the SMARTS "OCC". Does that 
match "[But]O"?  What does it return when I ask for the substructure matches 
(this function, if you aren't familiar with it, returns the indices of the 
matching atoms)? What about the SMARTS "CC"?

One solution to this that works with substructure searching is to have the 
molecule contain all the atoms - "O" in your example - but to have the four 
C atoms marked as a group so that drawings of the molecule display "[But]O". 
Supporting this type of functionality is on the To Do list (it's part of 
supporting S Groups from Mol files).

If you just want to indicate that there is a [But] group there but not really 
do anything with the group's structure, there's are probably already ways to 
handle this using dummy atoms and custom labels.

-greg




On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas 
<kovas.palu...@arzeda.com<mailto:kovas.palu...@arzeda.com>> wrote:
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as 
possible.  I would mostly want to use them for substructure matching, running 
reactions, and also display purposes.  Also, basic atom queries, such as 
getting a mapping number or a atom symbol.

I was thinking that maybe this could be done by just defining the CoA atom type 
(for example) just as the carbon or oxygen atom types are defined (setting 
atomic weight, valences, etc.).

Does this make sense?

 - Kovas
From: Greg Landrum <greg.land...@gmail.com<mailto:greg.land...@gmail.com>>
Sent: Wednesday, September 27, 2017 2:27:04 AM
To: Kovas Palunas
Cc: 
rdkit<mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit

Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?

-greg


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas 
<kovas.palu...@arzeda.com<mailto:kovas.palu...@arzeda.com>> wrote:
Hi all,

Has anyone tried implementing or using a group to atom masking strategy in 
RDKit?  By this I mean taking a piece of a molecule and representing it as a 
single atom.  Here is an example:

O  could be represented as  [But]O, where the atom [But] represents the 
four carbon chain.

In my case I'm particularly interested is using this strategy to represent 
large biological molecules / molecule pieces, such as coenzyme A.

If I were to implement this myself, is there a place in RDKit where atom types 
can be defined?

Thanks!

 - Kovas


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit<mailto:Rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>listinfo<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>rdkit<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>-discuss<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-27 Thread Greg Landrum
Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?

-greg


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas 
wrote:

> Hi all,
>
>
> Has anyone tried implementing or using a group to atom masking strategy in
> RDKit?  By this I mean taking a piece of a molecule and representing it as
> a single atom.  Here is an example:
>
>
> O  could be represented as  [But]O, where the atom [But] represents
> the four carbon chain.
>
>
> In my case I'm particularly interested is using this strategy to represent
> large biological molecules / molecule pieces, such as coenzyme A.
>
>
> If I were to implement this myself, is there a place in RDKit where atom
> types can be defined?
>
>
> Thanks!
>
>
>  - Kovas
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-27 Thread Kovas Palunas
Hi all,


Has anyone tried implementing or using a group to atom masking strategy in 
RDKit?  By this I mean taking a piece of a molecule and representing it as a 
single atom.  Here is an example:


O  could be represented as  [But]O, where the atom [But] represents the 
four carbon chain.


In my case I'm particularly interested is using this strategy to represent 
large biological molecules / molecule pieces, such as coenzyme A.


If I were to implement this myself, is there a place in RDKit where atom types 
can be defined?


Thanks!


 - Kovas

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss