Hi
The question 'what do you mean by ALL?' springs to mind. None of the
discussion includes dot-disconnected SMILES, which are also perfectly valid
representations. For example C(C1C2)C.C12 is yet another SMILES (of many
possible) for the example structure.
I've no idea whether this is of any relevance to you, but you should
probably consider these representations and decide whether they are
important or not.
Best regards,
Chris
On 6 August 2018 at 11:27, Jan Halborg Jensen <jhjen...@chem.ku.dk> wrote:
> This blogpost links to two other ones that may have done that (haven’t
> read them carefully): https://baoilleach.blogspot.com/2018/06/
> cheminformatics-for-deep-learners.html
>
> Best regards, Jan
>
> On 06 Aug 2018, at 11:57, Guillaume GODIN <guillaume.go...@firmenich.com>
> wrote:
>
> Dear Greg,
>
> Fantastic, thank you to give both explanation and solution to this “simple
> question”, I know this is not so simple & it’s fundamental for data
> augmentation in deep learning.
>
> If I may, I have another question related, do you know if someone has
> worked on a generator of all unique smiles independently of RDKit ?
>
> Thanks again,
>
> Guillaume
>
> *De : *Greg Landrum <greg.land...@gmail.com>
> *Date : *lundi, 6 août 2018 à 11:40
> *À : *Guillaume GODIN <guillaume.go...@firmenich.com>
> *Cc : *RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> *Objet : *Re: [Rdkit-discuss] enumeration of smiles question
>
>
> On Thu, Aug 2, 2018 at 8:59 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>
> I have a simple question about generating all possible smiles of a given
> molecule:
>
>
> It's a simple question, but the answer is somewhat complicated. :-)
>
>
>
> RDKit provides only 4 differents smiles for my molecule “CCC1CC1“:
> C1C(CC)C1
> CCC1CC1
> C1(CC)CC1
> C(C)C1CC1
>
> While by hand we can write those 7 smiles:
> CCC1CC1
> C(C)C1CC1
> C(C1CC1)C
> C1CC(CC)1
> C1C(CC)C1
> C1CC1CC
> C(CC)1CC1
>
> I use this function for the enumeration:
>
> def allsmiles(smil):
> m = Chem.MolFromSmiles(smil) # Construct a molecule from a SMILES
> string.
> if m is None:
> return smil
> N = m.GetNumAtoms()
> if N==0:
> return smil
> try:
> n= np.random.randint(0,high=N)
> t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
> except :
> return smil
> return t
>
> n= 50
> SMILES = [“CCC1CC1”]
> SMILES_mult = [allsmiles(S) for S in SMILES for i in range(n)]
>
> Why we cannot generate all the 7 smiles ?
>
>
> The RDKit has rules that it uses to decide which atom to branch to when
> generating a SMILES. These are used regardless of whether you are
> generating canonical SMILES or not.
> The upshot of this is that it will never generate a SMILES where there's a
> branch before a ring closure.
> The other important factor here is that atom rank is determined by the
> index of the atom in the molecule when you aren't using canonicalization.
> So changing the atom order on input can help:
>
> In [12]: set(allsmiles('CCC1CC1') for i in range(50))
> Out[12]: {'C(C)C1CC1', 'C1(CC)CC1', 'C1C(CC)C1', 'CCC1CC1'}
>
> In [13]: set(allsmiles('C1CC1CC') for i in range(50))
> Out[13]: {'C(C1CC1)C', 'C1(CC)CC1', 'C1CC1CC', 'CCC1CC1'}
>
> You can do this all at once as follows:
>
> ```
> In [20]: def allsmiles(smil):
> ...: m = Chem.MolFromSmiles(smil) # Construct a molecule from a
> SMILES string.
> ...: if m is None:
> ...: return smil
> ...: N = m.GetNumAtoms()
> ...: if N==0:
> ...: return smil
> ...: aids = list(range(N))
> ...: random.shuffle(aids)
> ...: m = Chem.RenumberAtoms(m,aids)
> ...: try:
> ...: n= random.randint(0,N-1)
> ...: t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
> ...: except :
> ...: return smil
> ...: return t
> ...:
> ...:
> ...:
>
> In [21]:
>
> In [21]: set(allsmiles('C1CC1CC') for i in range(50))
> Out[21]: {'C(C)C1CC1', 'C(C1CC1)C', 'C1(CC)CC1', 'C1C(CC)C1', 'C1CC1CC',
> 'CCC1CC1'}
> ```
> Note that I switched to using python's built in random module instead of
> using the one in numpy.
>
> -greg
>
>
>
>
>
> Thanks guys,
>
> Best regards,
>
> Guillaume
> ************************************************************
> ***********************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> ************************************************************
> ***********************
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sd
> m.link/slashdot_______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ************************************************************
> ***********************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> ************************************************************
> ***********************
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sd
> m.link/slashdot_______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss