Due to the nature of SCCP, which are just based on chlorination of
n-alkanes (so just linear), enumerating them via a more limited method than
the interesting linked preprint is also possible.
This can be done exhaustively on the string level
a given carbon in the chain will have either 0 or 1 or 2 chlorines (and in
all examples given just 0 or 1). Either cap (terminal carbon) of the
molecule can also have a third chlorine (so trichloromethyl cap). N alkanes
are linear so our SMILES only needs branching for Cl, which due to having a
valence of 1 is terminal.
therefore the following script can be used to generate them exhaustively
and even filter them for chlorine ratio without ever having to convert to a
mol

```
import rdkit
from rdkit import Chem
import itertools as it

allsmi=[]
maxvalence=2

def cl_ratio(Cl,C):
    return Cl*35.5/(C*12+Cl*35.5+C*2-Cl+2) #ratio of chlorine vs total MW

for chainlength in [10,11,12,13]:
    combinations = (list(i) for i in
it.product(list(range(maxvalence+1)),repeat=chainlength) if
tuple(reversed(i)) >= tuple(i)) #filter out mirror image molecules
    for comb in combinations:
        curr_smi = ""
        for cl_count in comb:
            curr_smi+="C" # increase chain with one carbon
            for i in range(cl_count):
                curr_smi+="(Cl)" # chlorinate as needed
        if 0.4<=cl_ratio(sum(comb),chainlength)<=0.7:
            allsmi.append(curr_smi) #add if it has the correct ratio of Cl
to MW
        if maxvalence>1: # check for molecules that have a trichloromethyl
terminal cap
            if comb[0]==2:
                if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
                    allsmi.append("(Cl)"+curr_smi)
                    if comb[-1]==2:
                        if 0.4<=cl_ratio(sum(comb)+2,chainlength)<=0.7:
                            allsmi.append(curr_smi+"(Cl)")
            else:
                if comb[-1]==2:
                    if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
                        allsmi.append(curr_smi+"(Cl)")
with open('CSSP.smi', 'w') as f:
    for smi in allsmi:
        f.write("%s\n" % smi)
```

it takes about 5 seconds to run on a desktop and the output is a smi file
with 437001 smiles strings. In case you want to have max 1 Chlorine per
carbon the solution is a bit easier and faster. In the above script you can
change max valence to 1, outputting ~7K molecules.


best wishes

On Wed, Dec 8, 2021 at 11:29 AM Gyro Funch <gyromagne...@gmail.com> wrote:

> Thank you very much for the pointer. I will investigate.
>
> Kind regards,
> Gyro
>
> On 2021-12-08 11:15 AM, Jan Halborg Jensen wrote:
>
> This package might do the trick:
> https://doi.org/10.26434/chemrxiv-2021-gt5lb
>
> On 8 Dec 2021, at 11.02, Gyro Funch <gyromagne...@gmail.com> wrote:
>
> [You don't often get email from gyromagne...@gmail.com. Learn why this is
> important at http://aka.ms/LearnAboutSenderIdentification.]
>
> Hello,
>
> I am not a chemist, but have been using RDKit to generate descriptors
> and fingerprints for molecules with known SMILES. It is a very useful
> package!
>
> I have a problem on which I hope someone can provide some guidance.
>
> My work is in the area of toxicology and I am interested in generating
> SMILES for molecules referred to as 'short chain chlorinated paraffins'
> (SCCP).
>
> A general definition that is sometimes used is that an SCCP is given by
> the molecular formula
>
> C_{x} H_{2x-y+2} Cl_{y}
>
> where
>
> x = 10-13
> y = 3-12
>
> and the average chlorine content ranges from 40-70% by mass.
>
> -----
>
> Can anyone provide guidance on how to generate the list of SMILES
> corresponding to the above rules?
>
> Thank you very much for your help!
>
> Kind regards,
> gyro
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&amp;data=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3D&amp;reserved=0
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to