Hi Benjamin,
The magic code uses a feature of python named "list comprehension".
https://www.pythonforbeginners.com/basics/list-comprehensions-in-python
It does not read the rxn string directly, but splits the string first. Since
the reaction string should be `reactants smiles>agents smiles>product smiles`,
we can get these SMILES strings by "rxn_string.split('>')".
Then for each part, we can use splitter "." to get single molecules. So
finally, [mols.split('.') for mols in rxn_string.split('>')] becomes
[[reactant1, reactant2, ..], [agent1, agent2, ..], [product1, product2, ...]].
But they are all SMILES strings.
mols_from_smiles_list is defined here:
https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py#L16
It just reads the smiles strings in a list into a molecule list. The only API
is uses is "Chem.MolFromSmiles".
The magic code can be translated into:
reactants_smiles, agents_smiles, product_smiles= mols in rxn_string.split('>')
package_results = []
for mols in reactants_smiles, agents_smiles, product_smiles:
x = mols.split('.')
y = mols_from_smiles_list(x) # x is a list of SMILES, and y is a list of
molecule objects
package_results.append(y)
reactants, agents, products = package_results
The code now is not cool enough.
I have no idea with the second question. May I ask where the parameters
threshold_unmapped_reactant_atoms and move_unmmapped_reactants_to_agents are
defined?
Best,
Hongbin Yang 杨弘宾, Ph.D.
Research: Toxicophore and Chemoinformatics
On 10/22/2019 13:08,Benjamin Datko<benjamin.datko....@gmail.com> wrote:
Hello all,
While reading the source code for ASKCOS
(https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py)
I noticed this code snippet (line 216 on the GitHub):
reactants, agents, products = [mols_from_smiles_list(x) for x in
[mols.split('.') for mols in rxn_string.split('>')]]
When the above code is applied on a SMILES reaction string, the result unpacks
the reactants, agents, and products mol objects into the respected variables,
with pretty good accuracy. The function 'mols_from_smiles' essentially just
applies Chem.MolFromSmiles over a list of smiles.
I think this code snippet is really cool but I cannot find any documentation on
how this is working. Searching this mailing list I came across the thread
(https://sourceforge.net/p/rdkit/mailman/message/36316849/) where this
operation of labeling reactants, agents, and products seems to be determined by
the threshold_unmapped_reactant_atoms explained in the quoted text from the
message (linked above)
Here's what's going on: By default the cartridge code does an extra step after
reading a reaction from SMILES/SMARTS: it looks at all the reactants and moves
any that don't have a sufficient fraction of mapped atoms to the agents. We do
this by default because the reactions that we found "in the wild" often have
agents, solvents, etc. mixed in with the reactants. The key parameter used
there is threshold_unmapped_reactant_atoms, which defaults to 0.2.
The only further reading I can find is from Greg's paper
(https://pubs.acs.org/doi/10.1021/ci5006614). I have two main questions:
1. Where in the code is this atom mapping being applied? I cannot tell when
this method is being applied or where the meta data is being saved. Applying
the code snippet above to a SMILES reaction string results in a list of
rdkit.Chem.rdchem.Mol objects. I cannot seem to find any static method or
attributes specifying if it's a reactant, agent, or product when inspecting a
mol object using help in a python terminal.
2. How can I change the value of the variables
threshold_unmapped_reactant_atoms and move_unmmapped_reactants_to_agents? I am
using rdkit version 2019.03.4 in an Anaconda environment. I want to experiment
changing the mapping threshold.
Very Respectfully,
Benjamin
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss