Due to the nature of SCCP, which are just based on chlorination of n-alkanes (so just linear), enumerating them via a more limited method than the interesting linked preprint is also possible. This can be done exhaustively on the string level a given carbon in the chain will have either 0 or 1 or 2 chlorines (and in all examples given just 0 or 1). Either cap (terminal carbon) of the molecule can also have a third chlorine (so trichloromethyl cap). N alkanes are linear so our SMILES only needs branching for Cl, which due to having a valence of 1 is terminal. therefore the following script can be used to generate them exhaustively and even filter them for chlorine ratio without ever having to convert to a mol
``` import rdkit from rdkit import Chem import itertools as it allsmi=[] maxvalence=2 def cl_ratio(Cl,C): return Cl*35.5/(C*12+Cl*35.5+C*2-Cl+2) #ratio of chlorine vs total MW for chainlength in [10,11,12,13]: combinations = (list(i) for i in it.product(list(range(maxvalence+1)),repeat=chainlength) if tuple(reversed(i)) >= tuple(i)) #filter out mirror image molecules for comb in combinations: curr_smi = "" for cl_count in comb: curr_smi+="C" # increase chain with one carbon for i in range(cl_count): curr_smi+="(Cl)" # chlorinate as needed if 0.4<=cl_ratio(sum(comb),chainlength)<=0.7: allsmi.append(curr_smi) #add if it has the correct ratio of Cl to MW if maxvalence>1: # check for molecules that have a trichloromethyl terminal cap if comb[0]==2: if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7: allsmi.append("(Cl)"+curr_smi) if comb[-1]==2: if 0.4<=cl_ratio(sum(comb)+2,chainlength)<=0.7: allsmi.append(curr_smi+"(Cl)") else: if comb[-1]==2: if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7: allsmi.append(curr_smi+"(Cl)") with open('CSSP.smi', 'w') as f: for smi in allsmi: f.write("%s\n" % smi) ``` it takes about 5 seconds to run on a desktop and the output is a smi file with 437001 smiles strings. In case you want to have max 1 Chlorine per carbon the solution is a bit easier and faster. In the above script you can change max valence to 1, outputting ~7K molecules. best wishes On Wed, Dec 8, 2021 at 11:29 AM Gyro Funch <gyromagne...@gmail.com> wrote: > Thank you very much for the pointer. I will investigate. > > Kind regards, > Gyro > > On 2021-12-08 11:15 AM, Jan Halborg Jensen wrote: > > This package might do the trick: > https://doi.org/10.26434/chemrxiv-2021-gt5lb > > On 8 Dec 2021, at 11.02, Gyro Funch <gyromagne...@gmail.com> wrote: > > [You don't often get email from gyromagne...@gmail.com. Learn why this is > important at http://aka.ms/LearnAboutSenderIdentification.] > > Hello, > > I am not a chemist, but have been using RDKit to generate descriptors > and fingerprints for molecules with known SMILES. It is a very useful > package! > > I have a problem on which I hope someone can provide some guidance. > > My work is in the area of toxicology and I am interested in generating > SMILES for molecules referred to as 'short chain chlorinated paraffins' > (SCCP). > > A general definition that is sometimes used is that an SCCP is given by > the molecular formula > > C_{x} H_{2x-y+2} Cl_{y} > > where > > x = 10-13 > y = 3-12 > > and the average chlorine content ranges from 40-70% by mass. > > ----- > > Can anyone provide guidance on how to generate the list of SMILES > corresponding to the above rules? > > Thank you very much for your help! > > Kind regards, > gyro > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3D&reserved=0 > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss