Re: [Rdkit-discuss] parsing reactions for reactants, agents, products

Benjamin Datko Tue, 22 Oct 2019 07:33:20 -0700

Hi Hongbin,

Thank you for breaking the code down. I am still new to python and all its
pythonic ways. I did not notice that reactants, agents, and products were
delimited by '>'. After you break it down, the code does lose its magic. =P

Do happen to have any good references on hand that describes some of the
standards used in RDKit or the Cheminformatics on molecular and reaction
repersentation?

The second question corresponds to the discussion I found in this thread (
https://sourceforge.net/p/rdkit/mailman/message/36316849/). I believe this
parameters correspond to the PgSQL RDKit implementation, but I am not
sure. Below I show a recursive search from the downloaded source of RDKit
from GitHub (https://github.com/greglandrum/rdkit).

$ pwd
Downloads/rdkit-master

$ grep -r move_unmmapped_reactants_to_agents .
./Code/PgSQL/rdkit/guc.c:static bool
*rdkit_move_unmmapped_reactants_to_agents* = true;
./Code/PgSQL/rdkit/guc.c:   "*rdkit.move_unmmapped_reactants_to_agents*",
 ./Code/PgSQL/rdkit/guc.c:   &*rdkit_move_unmmapped_reactants_to_agents*,
./Code/PgSQL/rdkit/guc.c:  return *rdkit_move_unmmapped_reactants_to_agents*
;
./Code/PgSQL/rdkit/expected/reaction.out:SET
*rdkit.move_unmmapped_reactants_to_agents*=true;
./Code/PgSQL/rdkit/expected/reaction.out:SET
*rdkit.move_unmmapped_reactants_to_agents*=false;
./Code/PgSQL/rdkit/expected/reaction.out:SET
*rdkit.move_unmmapped_reactants_to_agents*=true;
./Code/PgSQL/rdkit/expected/reaction.out:SET
*rdkit.move_unmmapped_reactants_to_agents*=false;
./Code/PgSQL/rdkit/expected/reaction.out:SET
*rdkit.move_unmmapped_reactants_to_agents*=true;
./Code/PgSQL/rdkit/sql/reaction.sql:SET
*rdkit.move_unmmapped_reactants_to_agents*=true;
./Code/PgSQL/rdkit/sql/reaction.sql:SET
*rdkit.move_unmmapped_reactants_to_agents*=false;
./Code/PgSQL/rdkit/sql/reaction.sql:SET
*rdkit.move_unmmapped_reactants_to_agents*=true;
./Code/PgSQL/rdkit/sql/reaction.sql:SET
*rdkit.move_unmmapped_reactants_to_agents*=false;
./Code/PgSQL/rdkit/sql/reaction.sql:SET
*rdkit.move_unmmapped_reactants_to_agents*=true;

$ grep -r threshold_unmapped_reactant_atoms .
./Code/PgSQL/rdkit/guc.c:static double rdkit_
*threshold_unmapped_reactant_atom*s = 0.2;
./Code/PgSQL/rdkit/guc.c:   "rdkit.*threshold_unmapped_reactant_atoms*",
./Code/PgSQL/rdkit/guc.c:   &*rdkit_threshold_unmapped_reactant_atoms*,
./Code/PgSQL/rdkit/guc.c:  return rdkit_*threshold_unmapped_reactant_atoms*
;
./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit.
*threshold_unmapped_reactant_atoms*=0.2;
./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit.
*threshold_unmapped_reactant_atoms*=0.9;
./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit.
*threshold_unmapped_reactant_atoms*=0.2;
./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit.
*threshold_unmapped_reactant_atoms*=0.2;
./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit.
*threshold_unmapped_reactant_atoms*=0.9;
./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit.
*threshold_unmapped_reactant_atoms*=0.2;

On Tue, Oct 22, 2019 at 2:29 AM Hongbin Yang <yanyangh...@163.com> wrote:

> Hi Benjamin,
>
> The magic code uses a feature of python named "list comprehension".
> https://www.pythonforbeginners.com/basics/list-comprehensions-in-python
>
> It does not read the rxn string directly, but splits the string first.
> Since the reaction string should be `reactants smiles>agents smiles>product
> smiles`, we can get these SMILES strings by "rxn_string.split('>')".
> Then for each part, we can use splitter "." to get single molecules. So
> finally, [mols.split('.') for mols in rxn_string.split('>')] becomes
> [[reactant1, reactant2, ..], [agent1, agent2, ..], [product1, product2,
> ...]]. But they are all SMILES strings.
>
> mols_from_smiles_list is defined here:
> https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py#L16
> It just reads the smiles strings in a list into a molecule list. The only
> API is uses is "Chem.MolFromSmiles".
>
> The magic code can be translated into:
>
> reactants_smiles, agents_smiles, product_smiles= mols in
> rxn_string.split('>')
> package_results = []
> for mols in reactants_smiles, agents_smiles, product_smiles:
>   x = mols.split('.')
>   y = mols_from_smiles_list(x)   # x is a list of SMILES, and y is a list
> of molecule objects
>   package_results.append(y)
> reactants, agents, products = package_results
>
> The code now is not cool enough.
>
> I have no idea with the second question. May I ask where the
> parameters threshold_unmapped_reactant_atoms and 
> move_unmmapped_reactants_to_agents
> are defined?
>
> Best,
>
> Hongbin Yang 杨弘宾, Ph.D.
> Research: Toxicophore and Chemoinformatics
> On 10/22/2019 13:08，Benjamin Datko<benjamin.datko....@gmail.com>
> <benjamin.datko....@gmail.com> wrote：
>
> Hello all,
>
> While reading the source code for ASKCOS (
> https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py)
> I noticed this code snippet (line 216 on the GitHub):
>
> reactants, agents, products = [mols_from_smiles_list(x) for x in
> [mols.split('.') for mols in rxn_string.split('>')]]
>
> When the above code is applied on a SMILES reaction string, the result
> unpacks the reactants, agents, and products mol objects into the respected
> variables, with pretty good accuracy.  The function 'mols_from_smiles'
> essentially just applies Chem.MolFromSmiles over a list of smiles.
>
> I think this code snippet is really cool but I cannot find any
> documentation on how this is working. Searching this mailing list I came
> across the thread (
> https://sourceforge.net/p/rdkit/mailman/message/36316849/) where this
> operation of labeling reactants, agents, and products seems to be
> determined by the threshold_unmapped_reactant_atoms explained in the quoted
> text from the message (linked above)
>
> Here's what's going on: By default the cartridge code does an extra step
>> after reading a reaction from SMILES/SMARTS: it looks at all the reactants
>> and moves any that don't have a sufficient fraction of mapped atoms to the
>> agents. We do this by default because the reactions that we found "in the
>> wild" often have agents, solvents, etc. mixed in with the reactants. The
>> key parameter used there is threshold_unmapped_reactant_atoms, which
>> defaults to 0.2.
>
>
> The only further reading I can find is from Greg's paper (
> https://pubs.acs.org/doi/10.1021/ci5006614). I have two main questions:
>
> 1. Where in the code is this atom mapping being applied? I cannot tell
> when this method is being applied or where the meta data is being saved.
> Applying the code snippet above to a SMILES reaction string results in a
> list of rdkit.Chem.rdchem.Mol objects. I cannot seem to find any static
> method or attributes specifying if it's a reactant, agent, or product when
> inspecting a mol object using help in a python terminal.
>
> 2. How can I change the value of the
> variables threshold_unmapped_reactant_atoms
> and move_unmmapped_reactants_to_agents? I am using rdkit version 2019.03.4
> in an Anaconda environment. I want to experiment changing the mapping
> threshold.
>
> Very Respectfully,
>
> Benjamin
>
>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] parsing reactions for reactants, agents, products

Reply via email to