Hello Wim,

Thank you very much for writing and sharing the script.

This is very helpful and should allow the exploration of different scenarios.

Kind regards,
Gyro


On 2021-12-08 12:59 PM, Wim Dehaen wrote:
Due to the nature of SCCP, which are just based on chlorination of n-alkanes (so just linear), enumerating them via a more limited method than the interesting linked preprint is also possible.
This can be done exhaustively on the string level
a given carbon in the chain will have either 0 or 1 or 2 chlorines (and in all examples given just 0 or 1). Either cap (terminal carbon) of the molecule can also have a third chlorine (so trichloromethyl cap). N alkanes are linear so our SMILES only needs branching for Cl, which due to having a valence of 1 is terminal. therefore the following script can be used to generate them exhaustively and even filter them for chlorine ratio without ever having to convert to a mol

```
import rdkit
from rdkit import Chem
import itertools as it

allsmi=[]
maxvalence=2

def cl_ratio(Cl,C):
    return Cl*35.5/(C*12+Cl*35.5+C*2-Cl+2) #ratio of chlorine vs total MW

for chainlength in [10,11,12,13]:
    combinations = (list(i) for i in it.product(list(range(maxvalence+1)),repeat=chainlength) if tuple(reversed(i)) >= tuple(i)) #filter out mirror image molecules
    for comb in combinations:
        curr_smi = ""
        for cl_count in comb:
            curr_smi+="C" # increase chain with one carbon
            for i in range(cl_count):
                curr_smi+="(Cl)" # chlorinate as needed
        if 0.4<=cl_ratio(sum(comb),chainlength)<=0.7:
            allsmi.append(curr_smi) #add if it has the correct ratio of Cl to MW         if maxvalence>1: # check for molecules that have a trichloromethyl terminal cap
            if comb[0]==2:
                if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
                    allsmi.append("(Cl)"+curr_smi)
                    if comb[-1]==2:
                        if 0.4<=cl_ratio(sum(comb)+2,chainlength)<=0.7:
                            allsmi.append(curr_smi+"(Cl)")
            else:
                if comb[-1]==2:
                    if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
                        allsmi.append(curr_smi+"(Cl)")
with open('CSSP.smi', 'w') as f:
    for smi in allsmi:
        f.write("%s\n" % smi)
```

it takes about 5 seconds to run on a desktop and the output is a smi file with 437001 smiles strings. In case you want to have max 1 Chlorine per carbon the solution is a bit easier and faster. In the above script you can change max valence to 1, outputting ~7K molecules.


best wishes

On Wed, Dec 8, 2021 at 11:29 AM Gyro Funch <gyromagne...@gmail.com> wrote:

    Thank you very much for the pointer. I will investigate.

    Kind regards,
    Gyro

    On 2021-12-08 11:15 AM, Jan Halborg Jensen wrote:
    This package might do the trick:
    https://doi.org/10.26434/chemrxiv-2021-gt5lb

    On 8 Dec 2021, at 11.02, Gyro Funch <gyromagne...@gmail.com> wrote:

    [You don't often get email from gyromagne...@gmail.com. Learn
    why this is important at
    http://aka.ms/LearnAboutSenderIdentification.]

    Hello,

    I am not a chemist, but have been using RDKit to generate
    descriptors
    and fingerprints for molecules with known SMILES. It is a very
    useful
    package!

    I have a problem on which I hope someone can provide some guidance.

    My work is in the area of toxicology and I am interested in
    generating
    SMILES for molecules referred to as 'short chain chlorinated
    paraffins'
    (SCCP).

    A general definition that is sometimes used is that an SCCP is
    given by
    the molecular formula

    C_{x} H_{2x-y+2} Cl_{y}

    where

    x = 10-13
    y = 3-12

    and the average chlorine content ranges from 40-70% by mass.

    -----

    Can anyone provide guidance on how to generate the list of SMILES
    corresponding to the above rules?

    Thank you very much for your help!

    Kind regards,
    gyro


    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&amp;data=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3D&amp;reserved=0
    
<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&amp;data=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3D&amp;reserved=0>


    _______________________________________________
    Rdkit-discuss mailing list
    Rdkit-discuss@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to