Re: [Rdkit-discuss] Masking groups as atoms in RDKit
Hi Kovas, Greg has precisely pointed out the major problem of collapsing fragments into single atoms: Searching and comparing structures. With that warning in mind: I use pseudo atoms (e.g. "Ala", "Arg",...) to good effect to represent amino acids in peptides and proteins. My colleague Esben Bjerrum has done custom builds of RDKit where the atomic_data.cpp file was changed to add the 22 natural amino acids. The rest of RDKit handles the new atoms surprisingly well. The new atoms can also be used in SMARTS queries as long as you reference them by atomic number (and Greg's caution about searching applies doubly in that case). So, yes, that's one way of doing it. Just don't expect anyone else to be able to interpret your molfiles reliably :-). You write that you want to mask away the macromolecule part since you are not going to interact with it. In that case it sounds like it is OK to throw away the underlying chemistry of the macromolecule and substitute a label for depiction. I would then go with Greg's suggestion to use dummy atoms and labels, e.g. import rdkit from rdkit import Chem from rdkit.Chem import Draw m = Chem.MolFromSmiles('CC[*:1]') # Put a molfile label on the star atom. m.GetAtoms()[2].SetProp("molFileAlias", "Macromol-section") print(Chem.MolToMolBlock(m)) PRINT OUTPUT: RDKit 3 2 0 0 0 0 0 0 0 0999 V2000 0. 0. 0. C 0 0 0 0 0 0 0 0 0 0 0 0 0. 0. 0. C 0 0 0 0 0 0 0 0 0 0 0 0 0. 0. 0. R 0 0 0 0 0 1 0 0 0 1 0 0 1 2 1 0 2 3 1 0 A 3 Macromol-section M END If you paste that molfile into MarvinSketch you see this (different tools will show labels in different ways): I am very much a molfile guy, so I don't know if labels can be carried over to RDKit SMILES strings. Cheers -- Jan On 2017-09-28 08:00, Kovas Palunas wrote: The way i was thinking about it, the smarts of OCC would not match the O[but] because [but] is a totally new atom that is not related to carbon at all. This doesn't really make sense in this example, but it does (i think) for most of my purposes (where i want to mask away a biological macromolecule that i do not want to interact with). There are probably still edge cases i'm not seeing... but maybe it's still worth a try? I saw there was a periodic table module in RDKit. Is it possible to add these atoms there? - Kovas From: Greg Landrum Sent: Wednesday, September 27, 10:13 PM Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit To: Kovas Palunas Cc: rdkit-discuss@lists.sourceforge.net I'm afraid that there's likely to be rather a lot of devil hiding in the details (as is so often the case). A simple example of one problem: let's take your [But]O case. Suppose you do a substructure search for the molecule defined by the SMARTS "OCC". Does that match "[But]O"? What does it return when I ask for the substructure matches (this function, if you aren't familiar with it, returns the indices of the matching atoms)? What about the SMARTS "CC"? One solution to this that works with substructure searching is to have the molecule contain all the atoms - "O" in your example - but to have the four C atoms marked as a group so that drawings of the molecule display "[But]O". Supporting this type of functionality is on the To Do list (it's part of supporting S Groups from Mol files). If you just want to indicate that there is a [But] group there but not really do anything with the group's structure, there's are probably already ways to handle this using dummy atoms and custom labels. -greg On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com <mailto:kovas.palu...@arzeda.com>> wrote: Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as possible. I would mostly want to use them for substructure matching, running reactions, and also display purposes. Also, basic atom queries, such as getting a mapping number or a atom symbol. I was thinking that maybe this could be done by just defining the CoA atom type (for example) just as the carbon or oxygen atom types are defined (setting atomic weight, valences, etc.). Does this make sense? - Kovas *From:*Greg Landrum<greg.land...@gmail.com <mailto:greg.land...@gmail.com>> *Sent:*Wednesday, September 27, 2017 2:27:04 AM *To:*Kovas Palunas *Cc:*rdkit <mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net <mailto:rdkit-discuss@lists.sourceforge.net> *Subject:*Re: [Rdkit-discuss] Masking groups as atoms in RDKit Where would you want to use this? Is it for depiction (i.e. drawing molecules) or something else? -greg On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <ko
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
I'm afraid that there's likely to be rather a lot of devil hiding in the details (as is so often the case). A simple example of one problem: let's take your [But]O case. Suppose you do a substructure search for the molecule defined by the SMARTS "OCC". Does that match "[But]O"? What does it return when I ask for the substructure matches (this function, if you aren't familiar with it, returns the indices of the matching atoms)? What about the SMARTS "CC"? One solution to this that works with substructure searching is to have the molecule contain all the atoms - "O" in your example - but to have the four C atoms marked as a group so that drawings of the molecule display "[But]O". Supporting this type of functionality is on the To Do list (it's part of supporting S Groups from Mol files). If you just want to indicate that there is a [But] group there but not really do anything with the group's structure, there's are probably already ways to handle this using dummy atoms and custom labels. -greg On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com> wrote: > Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms > as possible. I would mostly want to use them for substructure matching, > running reactions, and also display purposes. Also, basic atom queries, > such as getting a mapping number or a atom symbol. > > I was thinking that maybe this could be done by just defining the CoA atom > type (for example) just as the carbon or oxygen atom types are defined > (setting atomic weight, valences, etc.). > > Does this make sense? > > - Kovas > -- > *From:* Greg Landrum <greg.land...@gmail.com> > *Sent:* Wednesday, September 27, 2017 2:27:04 AM > *To:* Kovas Palunas > *Cc:* rdkit-discuss@lists.sourceforge.net > *Subject:* Re: [Rdkit-discuss] Masking groups as atoms in RDKit > > Where would you want to use this? > Is it for depiction (i.e. drawing molecules) or something else? > > -greg > > > On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com> > wrote: > >> Hi all, >> >> >> Has anyone tried implementing or using a group to atom masking strategy >> in RDKit? By this I mean taking a piece of a molecule and representing it >> as a single atom. Here is an example: >> >> >> O could be represented as [But]O, where the atom [But] represents >> the four carbon chain. >> >> >> In my case I'm particularly interested is using this strategy to >> represent large biological molecules / molecule pieces, such as coenzyme A. >> >> >> >> If I were to implement this myself, is there a place in RDKit where atom >> types can be defined? >> >> >> Thanks! >> >> >> - Kovas >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as possible. I would mostly want to use them for substructure matching, running reactions, and also display purposes. Also, basic atom queries, such as getting a mapping number or a atom symbol. I was thinking that maybe this could be done by just defining the CoA atom type (for example) just as the carbon or oxygen atom types are defined (setting atomic weight, valences, etc.). Does this make sense? - Kovas From: Greg Landrum <greg.land...@gmail.com> Sent: Wednesday, September 27, 2017 2:27:04 AM To: Kovas Palunas Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit Where would you want to use this? Is it for depiction (i.e. drawing molecules) or something else? -greg On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com<mailto:kovas.palu...@arzeda.com>> wrote: Hi all, Has anyone tried implementing or using a group to atom masking strategy in RDKit? By this I mean taking a piece of a molecule and representing it as a single atom. Here is an example: O could be represented as [But]O, where the atom [But] represents the four carbon chain. In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A. If I were to implement this myself, is there a place in RDKit where atom types can be defined? Thanks! - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
There's currently no way to add to the periodic table. I'm somewhat uncomfortable with the idea (there's a lot that can go wrong), but your use case isn't that uncommon (using custom atom types to represent amino acids has come up before), so it's worth thinking about how to do something like this. -greg On Thu, Sep 28, 2017 at 8:00 AM, Kovas Palunas <kovas.palu...@arzeda.com> wrote: > The way i was thinking about it, the smarts of OCC would not match the > O[but] because [but] is a totally new atom that is not related to carbon at > all. This doesn't really make sense in this example, but it does (i think) > for most of my purposes (where i want to mask away a biological > macromolecule that i do not want to interact with). > > There are probably still edge cases i'm not seeing... but maybe it's still > worth a try? I saw there was a periodic table module in RDKit. Is it > possible to add these atoms there? > > - Kovas > > > From: Greg Landrum > Sent: Wednesday, September 27, 10:13 PM > Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit > To: Kovas Palunas > Cc: rdkit-discuss@lists.sourceforge.net > > > > I'm afraid that there's likely to be rather a lot of devil hiding in the > details (as is so often the case). > > A simple example of one problem: let's take your [But]O case. Suppose you > do a substructure search for the molecule defined by the SMARTS "OCC". Does > that match "[But]O"? What does it return when I ask for the substructure > matches (this function, if you aren't familiar with it, returns the indices > of the matching atoms)? What about the SMARTS "CC"? > > One solution to this that works with substructure searching is to have the > molecule contain all the atoms - "O" in your example - but to have the > four C atoms marked as a group so that drawings of the molecule display > "[But]O". Supporting this type of functionality is on the To Do list (it's > part of supporting S Groups from Mol files). > > If you just want to indicate that there is a [But] group there but not > really do anything with the group's structure, there's are probably already > ways to handle this using dummy atoms and custom labels. > > -greg > > > > > On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com> > wrote: > > Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms > as possible. I would mostly want to use them for substructure matching, > running reactions, and also display purposes. Also, basic atom queries, > such as getting a mapping number or a atom symbol. > > I was thinking that maybe this could be done by just defining the CoA atom > type (for example) just as the carbon or oxygen atom types are defined > (setting atomic weight, valences, etc.). > > Does this make sense? > > - Kovas > *From:* Greg Landrum <greg.land...@gmail.com> > *Sent:* Wednesday, September 27, 2017 2:27:04 AM > *To:* Kovas Palunas > *Cc:* rdkit <rdkit-discuss@lists.sourceforge.net>-discuss@lists. > sourceforge.net <rdkit-discuss@lists.sourceforge.net> > *Subject:* Re: [Rdkit-discuss] Masking groups as atoms in RDKit > > Where would you want to use this? > Is it for depiction (i.e. drawing molecules) or something else? > > -greg > > > On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com> > wrote: > > Hi all, > > Has anyone tried implementing or using a group to atom masking strategy in > RDKit? By this I mean taking a piece of a molecule and representing it as > a single atom. Here is an example: > > O could be represented as [But]O, where the atom [But] represents > the four carbon chain. > > In my case I'm particularly interested is using this strategy to represent > large biological molecules / molecule pieces, such as coenzyme A. > > If I were to implement this myself, is there a place in RDKit where atom > types can be defined? > > Thanks! > > - Kovas > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit <Rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net > <Rdkit-discuss@lists.sourceforge.net> > https://lists.sourceforge.net/lists/ > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>listinfo > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>/ > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>rdki
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
The way i was thinking about it, the smarts of OCC would not match the O[but] because [but] is a totally new atom that is not related to carbon at all. This doesn't really make sense in this example, but it does (i think) for most of my purposes (where i want to mask away a biological macromolecule that i do not want to interact with). There are probably still edge cases i'm not seeing... but maybe it's still worth a try? I saw there was a periodic table module in RDKit. Is it possible to add these atoms there? - Kovas From: Greg Landrum Sent: Wednesday, September 27, 10:13 PM Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit To: Kovas Palunas Cc: rdkit-discuss@lists.sourceforge.net I'm afraid that there's likely to be rather a lot of devil hiding in the details (as is so often the case). A simple example of one problem: let's take your [But]O case. Suppose you do a substructure search for the molecule defined by the SMARTS "OCC". Does that match "[But]O"? What does it return when I ask for the substructure matches (this function, if you aren't familiar with it, returns the indices of the matching atoms)? What about the SMARTS "CC"? One solution to this that works with substructure searching is to have the molecule contain all the atoms - "O" in your example - but to have the four C atoms marked as a group so that drawings of the molecule display "[But]O". Supporting this type of functionality is on the To Do list (it's part of supporting S Groups from Mol files). If you just want to indicate that there is a [But] group there but not really do anything with the group's structure, there's are probably already ways to handle this using dummy atoms and custom labels. -greg On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas <kovas.palu...@arzeda.com<mailto:kovas.palu...@arzeda.com>> wrote: Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as possible. I would mostly want to use them for substructure matching, running reactions, and also display purposes. Also, basic atom queries, such as getting a mapping number or a atom symbol. I was thinking that maybe this could be done by just defining the CoA atom type (for example) just as the carbon or oxygen atom types are defined (setting atomic weight, valences, etc.). Does this make sense? - Kovas From: Greg Landrum <greg.land...@gmail.com<mailto:greg.land...@gmail.com>> Sent: Wednesday, September 27, 2017 2:27:04 AM To: Kovas Palunas Cc: rdkit<mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit Where would you want to use this? Is it for depiction (i.e. drawing molecules) or something else? -greg On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas <kovas.palu...@arzeda.com<mailto:kovas.palu...@arzeda.com>> wrote: Hi all, Has anyone tried implementing or using a group to atom masking strategy in RDKit? By this I mean taking a piece of a molecule and representing it as a single atom. Here is an example: O could be represented as [But]O, where the atom [But] represents the four carbon chain. In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A. If I were to implement this myself, is there a place in RDKit where atom types can be defined? Thanks! - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit<mailto:Rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>listinfo<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>rdkit<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>-discuss<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
Where would you want to use this? Is it for depiction (i.e. drawing molecules) or something else? -greg On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunaswrote: > Hi all, > > > Has anyone tried implementing or using a group to atom masking strategy in > RDKit? By this I mean taking a piece of a molecule and representing it as > a single atom. Here is an example: > > > O could be represented as [But]O, where the atom [But] represents > the four carbon chain. > > > In my case I'm particularly interested is using this strategy to represent > large biological molecules / molecule pieces, such as coenzyme A. > > > If I were to implement this myself, is there a place in RDKit where atom > types can be defined? > > > Thanks! > > > - Kovas > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Masking groups as atoms in RDKit
Hi all, Has anyone tried implementing or using a group to atom masking strategy in RDKit? By this I mean taking a piece of a molecule and representing it as a single atom. Here is an example: O could be represented as [But]O, where the atom [But] represents the four carbon chain. In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A. If I were to implement this myself, is there a place in RDKit where atom types can be defined? Thanks! - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss