Hi Kovas,
Greg has precisely pointed out the major problem of collapsing fragments
into single atoms: Searching and comparing structures.
With that warning in mind: I use pseudo atoms (e.g. "Ala", "Arg",...) to
good effect to represent amino acids in peptides and proteins. My
colleague Esben Bjerrum has done custom builds of RDKit where the
atomic_data.cpp file was changed to add the 22 natural amino acids.
The rest of RDKit handles the new atoms surprisingly well. The new atoms
can also be used in SMARTS queries as long as you reference them by
atomic number (and Greg's caution about searching applies doubly in that
case).
So, yes, that's one way of doing it. Just don't expect anyone else to be
able to interpret your molfiles reliably :-).
You write that you want to mask away the macromolecule part since you
are not going to interact with it. In that case it sounds like it is OK
to throw away the underlying chemistry of the macromolecule and
substitute a label for depiction. I would then go with Greg's suggestion
to use dummy atoms and labels, e.g.
import rdkit
from rdkit import Chem
from rdkit.Chem import Draw
m = Chem.MolFromSmiles('CC[*:1]')
# Put a molfile label on the star atom.
m.GetAtoms()[2].SetProp("molFileAlias", "Macromol-section")
print(Chem.MolToMolBlock(m))
PRINT OUTPUT:
RDKit
3 2 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 R 0 0 0 0 0 1 0 0 0 1 0 0
1 2 1 0
2 3 1 0
A 3
Macromol-section
M END
If you paste that molfile into MarvinSketch you see this (different
tools will show labels in different ways):
I am very much a molfile guy, so I don't know if labels can be carried
over to RDKit SMILES strings.
Cheers
-- Jan
On 2017-09-28 08:00, Kovas Palunas wrote:
The way i was thinking about it, the smarts of OCC would not match the
O[but] because [but] is a totally new atom that is not related to
carbon at all. This doesn't really make sense in this example, but it
does (i think) for most of my purposes (where i want to mask away a
biological macromolecule that i do not want to interact with).
There are probably still edge cases i'm not seeing... but maybe it's
still worth a try? I saw there was a periodic table module in RDKit.
Is it possible to add these atoms there?
- Kovas
From: Greg Landrum
Sent: Wednesday, September 27, 10:13 PM
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net
I'm afraid that there's likely to be rather a lot of devil hiding in
the details (as is so often the case).
A simple example of one problem: let's take your [But]O case. Suppose
you do a substructure search for the molecule defined by the SMARTS
"OCC". Does that match "[But]O"? What does it return when I ask for
the substructure matches (this function, if you aren't familiar with
it, returns the indices of the matching atoms)? What about the SMARTS
"CC"?
One solution to this that works with substructure searching is to have
the molecule contain all the atoms - "CCCCO" in your example - but to
have the four C atoms marked as a group so that drawings of the
molecule display "[But]O". Supporting this type of functionality is on
the To Do list (it's part of supporting S Groups from Mol files).
If you just want to indicate that there is a [But] group there but not
really do anything with the group's structure, there's are probably
already ways to handle this using dummy atoms and custom labels.
-greg
On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas
<kovas.palu...@arzeda.com <mailto:kovas.palu...@arzeda.com>> wrote:
Ideally, I'd like to treat these pseudoatoms as similarly to normal
atoms as possible. I would mostly want to use them for substructure
matching, running reactions, and also display purposes. Also, basic
atom queries, such as getting a mapping number or a atom symbol.
I was thinking that maybe this could be done by just defining the CoA
atom type (for example) just as the carbon or oxygen atom types are
defined (setting atomic weight, valences, etc.).
Does this make sense?
- Kovas
*From:*Greg Landrum<greg.land...@gmail.com
<mailto:greg.land...@gmail.com>>
*Sent:*Wednesday, September 27, 2017 2:27:04 AM
*To:*Kovas Palunas
*Cc:*rdkit
<mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net
<mailto:rdkit-discuss@lists.sourceforge.net>
*Subject:*Re: [Rdkit-discuss] Masking groups as atoms in RDKit
Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?
-greg
On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas
<kovas.palu...@arzeda.com <mailto:kovas.palu...@arzeda.com>> wrote:
Hi all,
Has anyone tried implementing or using a group to atom masking
strategy in RDKit? By this I mean taking a piece of a molecule and
representing it as a single atom. Here is an example:
CCCCO could be represented as [But]O, where the atom [But]
represents the four carbon chain.
In my case I'm particularly interested is using this strategy to
represent large biological molecules / molecule pieces, such as
coenzyme A.
If I were to implement this myself, is there a place in RDKit where
atom types can be defined?
Thanks!
- Kovas
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss