Dear Guillaume,

Sorry for interruption, but you've mentioned to this paper, haven't you?
"SMILES Enumeration as Data Augmentation for Neural Network Modeling of
Molecules"
https://arxiv.org/pdf/1703.07076.pdf

The author says that RDKit is used in the paper. And its implementation is
published on github: https://github.com/Ebjerrum/SMILES-enumeration
I wish that this will help you.

Best regards,
Shojiro

On 6 August 2018 at 18:57, Guillaume GODIN <guillaume.go...@firmenich.com>
wrote:

> Dear Greg,
>
>
>
> Fantastic, thank you to give both explanation and solution to this “simple
> question”, I know this is not so simple & it’s fundamental for data
> augmentation in deep learning.
>
>
>
> If I may, I have another question related, do you know if someone has
> worked on a generator of all unique smiles independently of RDKit ?
>
>
>
> Thanks again,
>
>
>
> Guillaume
>
>
>
> *De : *Greg Landrum <greg.land...@gmail.com>
> *Date : *lundi, 6 août 2018 à 11:40
> *À : *Guillaume GODIN <guillaume.go...@firmenich.com>
> *Cc : *RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> *Objet : *Re: [Rdkit-discuss] enumeration of smiles question
>
>
>
>
>
> On Thu, Aug 2, 2018 at 8:59 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>
>
> I have a simple question about generating all possible smiles of a given
> molecule:
>
>
>
> It's a simple question, but the answer is somewhat complicated. :-)
>
>
>
>
>
> RDKit provides only 4 differents smiles for my molecule “CCC1CC1“:
>
> C1C(CC)C1
>
> CCC1CC1
>
> C1(CC)CC1
>
> C(C)C1CC1
>
>
>
> While by hand we can write those 7 smiles:
>
> CCC1CC1
>
> C(C)C1CC1
>
> C(C1CC1)C
>
> C1CC(CC)1
>
> C1C(CC)C1
>
> C1CC1CC
>
> C(CC)1CC1
>
>
>
> I use this function for the enumeration:
>
>
>
> def allsmiles(smil):
>
>     m = Chem.MolFromSmiles(smil) # Construct a molecule from a SMILES
> string.
>
>     if m is None:
>
>         return smil
>
>     N = m.GetNumAtoms()
>
>     if N==0:
>
>         return smil
>
>     try:
>
>         n= np.random.randint(0,high=N)
>
>         t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
>
>     except :
>
>         return smil
>
>     return t
>
>
>
> n= 50
>
> SMILES = [“CCC1CC1”]
>
> SMILES_mult = [allsmiles(S) for S in SMILES for i in range(n)]
>
>
>
> Why we cannot generate all the 7 smiles ?
>
>
>
> The RDKit has rules that it uses to decide which atom to branch to when
> generating a SMILES. These are used regardless of whether you are
> generating canonical SMILES or not.
>
> The upshot of this is that it will never generate a SMILES where there's a
> branch before a ring closure.
>
> The other important factor here is that atom rank is determined by the
> index of the atom in the molecule when you aren't using canonicalization.
> So changing the atom order on input can help:
>
> In [12]: set(allsmiles('CCC1CC1') for i in range(50))
>
> Out[12]: {'C(C)C1CC1', 'C1(CC)CC1', 'C1C(CC)C1', 'CCC1CC1'}
>
>
>
> In [13]: set(allsmiles('C1CC1CC') for i in range(50))
>
> Out[13]: {'C(C1CC1)C', 'C1(CC)CC1', 'C1CC1CC', 'CCC1CC1'}
>
> You can do this all at once as follows:
>
>
>
> ```
>
> In [20]: def allsmiles(smil):
>
>     ...:     m = Chem.MolFromSmiles(smil) # Construct a molecule from a
> SMILES string.
>
>     ...:     if m is None:
>
>     ...:         return smil
>
>     ...:     N = m.GetNumAtoms()
>
>     ...:     if N==0:
>
>     ...:         return smil
>
>     ...:     aids = list(range(N))
>
>     ...:     random.shuffle(aids)
>
>     ...:     m = Chem.RenumberAtoms(m,aids)
>
>     ...:     try:
>
>     ...:         n= random.randint(0,N-1)
>
>     ...:         t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
>
>     ...:     except :
>
>     ...:         return smil
>
>     ...:     return t
>
>     ...:
>
>     ...:
>
>     ...:
>
>
>
> In [21]:
>
>
>
> In [21]: set(allsmiles('C1CC1CC') for i in range(50))
>
> Out[21]: {'C(C)C1CC1', 'C(C1CC1)C', 'C1(CC)CC1', 'C1C(CC)C1', 'C1CC1CC',
> 'CCC1CC1'}
>
> ```
>
> Note that I switched to using python's built in random module instead of
> using the one in numpy.
>
>
>
> -greg
>
>
>
>
>
>
>
>
>
> Thanks guys,
>
>
>
> Best regards,
>
>
>
> Guillaume
>
> ************************************************************
> ***********************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> ************************************************************
> ***********************
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
> _________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ************************************************************
> ***********************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> ************************************************************
> ***********************
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
----
The University of Tokyo
2nd year Ph.D. candidate
  Shojiro Shibayama
----
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to