Hi Benjamin,

The magic code uses a feature of python named "list comprehension". 
https://www.pythonforbeginners.com/basics/list-comprehensions-in-python


It does not read the rxn string directly, but splits the string first. Since 
the reaction string should be `reactants smiles>agents smiles>product smiles`, 
we can get these SMILES strings by "rxn_string.split('>')".
Then for each part, we can use splitter "." to get single molecules. So 
finally, [mols.split('.') for mols in rxn_string.split('>')] becomes 
[[reactant1, reactant2, ..], [agent1, agent2, ..], [product1, product2, ...]]. 
But they are all SMILES strings.


mols_from_smiles_list is defined here: 
https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py#L16
It just reads the smiles strings in a list into a molecule list. The only API 
is uses is "Chem.MolFromSmiles".


The magic code can be translated into:


reactants_smiles, agents_smiles, product_smiles= mols in rxn_string.split('>')
package_results = []
for mols in reactants_smiles, agents_smiles, product_smiles:
  x = mols.split('.')
  y = mols_from_smiles_list(x)   # x is a list of SMILES, and y is a list of 
molecule objects
  package_results.append(y)
reactants, agents, products = package_results


The code now is not cool enough.


I have no idea with the second question. May I ask where the parameters 
threshold_unmapped_reactant_atoms and move_unmmapped_reactants_to_agents are 
defined?


Best,


Hongbin Yang 杨弘宾, Ph.D.
Research: Toxicophore and Chemoinformatics

On 10/22/2019 13:08,Benjamin Datko<benjamin.datko....@gmail.com> wrote:
Hello all,


While reading the source code for ASKCOS 
(https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py) 
I noticed this code snippet (line 216 on the GitHub):


reactants, agents, products = [mols_from_smiles_list(x) for x in 
[mols.split('.') for mols in rxn_string.split('>')]]



When the above code is applied on a SMILES reaction string, the result unpacks 
the reactants, agents, and products mol objects into the respected variables, 
with pretty good accuracy.  The function 'mols_from_smiles' essentially just 
applies Chem.MolFromSmiles over a list of smiles.


I think this code snippet is really cool but I cannot find any documentation on 
how this is working. Searching this mailing list I came across the thread 
(https://sourceforge.net/p/rdkit/mailman/message/36316849/) where this 
operation of labeling reactants, agents, and products seems to be determined by 
the threshold_unmapped_reactant_atoms explained in the quoted text from the 
message (linked above)


Here's what's going on: By default the cartridge code does an extra step after 
reading a reaction from SMILES/SMARTS: it looks at all the reactants and moves 
any that don't have a sufficient fraction of mapped atoms to the agents. We do 
this by default because the reactions that we found "in the wild" often have 
agents, solvents, etc. mixed in with the reactants. The key parameter used 
there is threshold_unmapped_reactant_atoms, which defaults to 0.2.


The only further reading I can find is from Greg's paper 
(https://pubs.acs.org/doi/10.1021/ci5006614). I have two main questions: 


1. Where in the code is this atom mapping being applied? I cannot tell when 
this method is being applied or where the meta data is being saved. Applying 
the code snippet above to a SMILES reaction string results in a list of 
rdkit.Chem.rdchem.Mol objects. I cannot seem to find any static method or 
attributes specifying if it's a reactant, agent, or product when inspecting a 
mol object using help in a python terminal.


2. How can I change the value of the variables 
threshold_unmapped_reactant_atoms and move_unmmapped_reactants_to_agents? I am 
using rdkit version 2019.03.4 in an Anaconda environment. I want to experiment 
changing the mapping threshold.


Very Respectfully,


Benjamin
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to