Hello Wim,
Thank you very much for writing and sharing the script.
This is very helpful and should allow the exploration of different
scenarios.
Kind regards,
Gyro
On 2021-12-08 12:59 PM, Wim Dehaen wrote:
Due to the nature of SCCP, which are just based on chlorination of
n-alkanes (so just linear), enumerating them via a more limited method
than the interesting linked preprint is also possible.
This can be done exhaustively on the string level
a given carbon in the chain will have either 0 or 1 or 2 chlorines
(and in all examples given just 0 or 1). Either cap (terminal carbon)
of the molecule can also have a third chlorine (so trichloromethyl
cap). N alkanes are linear so our SMILES only needs branching for Cl,
which due to having a valence of 1 is terminal.
therefore the following script can be used to generate them
exhaustively and even filter them for chlorine ratio without ever
having to convert to a mol
```
import rdkit
from rdkit import Chem
import itertools as it
allsmi=[]
maxvalence=2
def cl_ratio(Cl,C):
return Cl*35.5/(C*12+Cl*35.5+C*2-Cl+2) #ratio of chlorine vs total MW
for chainlength in [10,11,12,13]:
combinations = (list(i) for i in
it.product(list(range(maxvalence+1)),repeat=chainlength) if
tuple(reversed(i)) >= tuple(i)) #filter out mirror image molecules
for comb in combinations:
curr_smi = ""
for cl_count in comb:
curr_smi+="C" # increase chain with one carbon
for i in range(cl_count):
curr_smi+="(Cl)" # chlorinate as needed
if 0.4<=cl_ratio(sum(comb),chainlength)<=0.7:
allsmi.append(curr_smi) #add if it has the correct ratio
of Cl to MW
if maxvalence>1: # check for molecules that have a
trichloromethyl terminal cap
if comb[0]==2:
if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
allsmi.append("(Cl)"+curr_smi)
if comb[-1]==2:
if 0.4<=cl_ratio(sum(comb)+2,chainlength)<=0.7:
allsmi.append(curr_smi+"(Cl)")
else:
if comb[-1]==2:
if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
allsmi.append(curr_smi+"(Cl)")
with open('CSSP.smi', 'w') as f:
for smi in allsmi:
f.write("%s\n" % smi)
```
it takes about 5 seconds to run on a desktop and the output is a smi
file with 437001 smiles strings. In case you want to have max 1
Chlorine per carbon the solution is a bit easier and faster. In the
above script you can change max valence to 1, outputting ~7K molecules.
best wishes
On Wed, Dec 8, 2021 at 11:29 AM Gyro Funch <gyromagne...@gmail.com> wrote:
Thank you very much for the pointer. I will investigate.
Kind regards,
Gyro
On 2021-12-08 11:15 AM, Jan Halborg Jensen wrote:
This package might do the trick:
https://doi.org/10.26434/chemrxiv-2021-gt5lb
On 8 Dec 2021, at 11.02, Gyro Funch <gyromagne...@gmail.com> wrote:
[You don't often get email from gyromagne...@gmail.com. Learn
why this is important at
http://aka.ms/LearnAboutSenderIdentification.]
Hello,
I am not a chemist, but have been using RDKit to generate
descriptors
and fingerprints for molecules with known SMILES. It is a very
useful
package!
I have a problem on which I hope someone can provide some guidance.
My work is in the area of toxicology and I am interested in
generating
SMILES for molecules referred to as 'short chain chlorinated
paraffins'
(SCCP).
A general definition that is sometimes used is that an SCCP is
given by
the molecular formula
C_{x} H_{2x-y+2} Cl_{y}
where
x = 10-13
y = 3-12
and the average chlorine content ranges from 40-70% by mass.
-----
Can anyone provide guidance on how to generate the list of SMILES
corresponding to the above rules?
Thank you very much for your help!
Kind regards,
gyro
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3D&reserved=0
<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3D&reserved=0>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss