Hi Kovas,

Greg has precisely pointed out the major problem of collapsing fragments into single atoms: Searching and comparing structures.

With that warning in mind: I use pseudo atoms (e.g. "Ala", "Arg",...) to good effect to represent amino acids in peptides and proteins. My colleague Esben Bjerrum has done custom builds of RDKit where the atomic_data.cpp file was changed to add the 22 natural amino acids.

The rest of RDKit handles the new atoms surprisingly well. The new atoms can also be used in SMARTS queries as long as you reference them by atomic number (and Greg's caution about searching applies doubly in that case).

So, yes, that's one way of doing it. Just don't expect anyone else to be able to interpret your molfiles reliably :-).

You write that you want to mask away the macromolecule part since you are not going to interact with it. In that case it sounds like it is OK to throw away the underlying chemistry of the macromolecule and substitute a label for depiction. I would then go with Greg's suggestion to use dummy atoms and labels, e.g.

   import rdkit
   from rdkit import Chem
   from rdkit.Chem import Draw

   m = Chem.MolFromSmiles('CC[*:1]')
   # Put a molfile label on the star atom.
   m.GetAtoms()[2].SetProp("molFileAlias", "Macromol-section")




      3  2  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0 0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0 0  0  0  0
        0.0000    0.0000    0.0000 R   0  0  0  0  0  1  0  0 0  1  0  0
      1  2  1  0
      2  3  1  0
   A    3
   M  END

If you paste that molfile into MarvinSketch you see this (different tools will show labels in different ways):

I am very much a molfile guy, so I don't know if labels can be carried over to RDKit SMILES strings.

-- Jan

On 2017-09-28 08:00, Kovas Palunas wrote:
The way i was thinking about it, the smarts of OCC would not match the O[but] because [but] is a totally new atom that is not related to carbon at all.  This doesn't really make sense in this example, but it does (i think) for most of my purposes (where i want to mask away a biological macromolecule that i do not want to interact with).

There are probably still edge cases i'm not seeing... but maybe it's still worth a try?  I saw there was a periodic table module in RDKit.  Is it possible to add these atoms there?

- Kovas

From: Greg Landrum
Sent: Wednesday, September 27, 10:13 PM
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net

I'm afraid that there's likely to be rather a lot of devil hiding in the details (as is so often the case).

A simple example of one problem: let's take your [But]O case. Suppose you do a substructure search for the molecule defined by the SMARTS "OCC". Does that match "[But]O"?  What does it return when I ask for the substructure matches (this function, if you aren't familiar with it, returns the indices of the matching atoms)? What about the SMARTS "CC"?

One solution to this that works with substructure searching is to have the molecule contain all the atoms - "CCCCO" in your example - but to have the four C atoms marked as a group so that drawings of the molecule display "[But]O". Supporting this type of functionality is on the To Do list (it's part of supporting S Groups from Mol files).

If you just want to indicate that there is a [But] group there but not really do anything with the group's structure, there's are probably already ways to handle this using dummy atoms and custom labels.


On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com <mailto:kovas.palu...@arzeda.com>> wrote:
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as possible.  I would mostly want to use them for substructure matching, running reactions, and also display purposes.  Also, basic atom queries, such as getting a mapping number or a atom symbol.

I was thinking that maybe this could be done by just defining the CoA atom type (for example) just as the carbon or oxygen atom types are defined (setting atomic weight, valences, etc.).

Does this make sense?

 - Kovas
*From:*Greg Landrum<greg.land...@gmail.com <mailto:greg.land...@gmail.com>>
*Sent:*Wednesday, September 27, 2017 2:27:04 AM
*To:*Kovas Palunas
*Cc:*rdkit <mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net <mailto:rdkit-discuss@lists.sourceforge.net>
*Subject:*Re: [Rdkit-discuss] Masking groups as atoms in RDKit

Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com <mailto:kovas.palu...@arzeda.com>> wrote:
Hi all,

Has anyone tried implementing or using a group to atom masking strategy in RDKit?  By this I mean taking a piece of a molecule and representing it as a single atom.  Here is an example:

CCCCO  could be represented as  [But]O, where the atom [But] represents the four carbon chain.

In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A.

If I were to implement this myself, is there a place in RDKit where atom types can be defined?


 - Kovas

Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to