Hi Benjamin,

On Tue, Oct 22, 2019 at 4:32 PM Benjamin Datko <benjamin.datko....@gmail.com>
wrote:

> Hi Hongbin,
>
> Thank you for breaking the code down. I am still new to python and all its
> pythonic ways. I did not notice that reactants, agents, and products were
> delimited by '>'. After you break it down, the code does lose its magic. =P
>

Yeah, as long as you have clean reaction data that works great (see below).


>
> Do happen to have any good references on hand that describes some of the
> standards used in RDKit or the Cheminformatics on molecular and reaction
> repersentation?
>

The SMiLES/SMARTS based formats are documented here:
https://www.daylight.com/dayhtml/doc/theory/
That's a great place to start.


> The second question corresponds to the discussion I found in this thread (
> https://sourceforge.net/p/rdkit/mailman/message/36316849/). I believe
> this parameters correspond to the PgSQL RDKit implementation, but I am not
> sure. Below I show a recursive search from the downloaded source of RDKit
> from GitHub (https://github.com/greglandrum/rdkit).
>

The method RemoveUnmappedReactantTemplates() on the chemical reaction
object is there for a few reasons. The primary one is that "real world"
reaction data often isn't 100% clean and can include solvents/reagents in
the reactants section (be that in SMILES or RXN files).
RemoveUnmappedReactantTemplates() solves that using a simple heuristic:
"reactants" that contain more than a threshold percentage of unmapped atoms
are either completely removed or marked as agents

Hope this helps.
-greg


> $ pwd
> Downloads/rdkit-master
>
> $ grep -r move_unmmapped_reactants_to_agents .
> ./Code/PgSQL/rdkit/guc.c:static bool
> *rdkit_move_unmmapped_reactants_to_agents* = true;
> ./Code/PgSQL/rdkit/guc.c:   "*rdkit.move_unmmapped_reactants_to_agents*",
>  ./Code/PgSQL/rdkit/guc.c:   &*rdkit_move_unmmapped_reactants_to_agents*,
> ./Code/PgSQL/rdkit/guc.c:  return
> *rdkit_move_unmmapped_reactants_to_agents*;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET
> *rdkit.move_unmmapped_reactants_to_agents*=true;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET
> *rdkit.move_unmmapped_reactants_to_agents*=false;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET
> *rdkit.move_unmmapped_reactants_to_agents*=true;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET
> *rdkit.move_unmmapped_reactants_to_agents*=false;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET
> *rdkit.move_unmmapped_reactants_to_agents*=true;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET
> *rdkit.move_unmmapped_reactants_to_agents*=true;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET
> *rdkit.move_unmmapped_reactants_to_agents*=false;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET
> *rdkit.move_unmmapped_reactants_to_agents*=true;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET
> *rdkit.move_unmmapped_reactants_to_agents*=false;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET
> *rdkit.move_unmmapped_reactants_to_agents*=true;
>
> $ grep -r threshold_unmapped_reactant_atoms .
> ./Code/PgSQL/rdkit/guc.c:static double rdkit_
> *threshold_unmapped_reactant_atom*s = 0.2;
> ./Code/PgSQL/rdkit/guc.c:   "rdkit.*threshold_unmapped_reactant_atoms*",
> ./Code/PgSQL/rdkit/guc.c:   &*rdkit_threshold_unmapped_reactant_atoms*,
> ./Code/PgSQL/rdkit/guc.c:  return rdkit_
> *threshold_unmapped_reactant_atoms*;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit.
> *threshold_unmapped_reactant_atoms*=0.2;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit.
> *threshold_unmapped_reactant_atoms*=0.9;
> ./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit.
> *threshold_unmapped_reactant_atoms*=0.2;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit.
> *threshold_unmapped_reactant_atoms*=0.2;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit.
> *threshold_unmapped_reactant_atoms*=0.9;
> ./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit.
> *threshold_unmapped_reactant_atoms*=0.2;
>
>
>
> On Tue, Oct 22, 2019 at 2:29 AM Hongbin Yang <yanyangh...@163.com> wrote:
>
>> Hi Benjamin,
>>
>> The magic code uses a feature of python named "list comprehension".
>> https://www.pythonforbeginners.com/basics/list-comprehensions-in-python
>>
>> It does not read the rxn string directly, but splits the string first.
>> Since the reaction string should be `reactants smiles>agents smiles>product
>> smiles`, we can get these SMILES strings by "rxn_string.split('>')".
>> Then for each part, we can use splitter "." to get single molecules. So
>> finally, [mols.split('.') for mols in rxn_string.split('>')] becomes
>> [[reactant1, reactant2, ..], [agent1, agent2, ..], [product1, product2,
>> ...]]. But they are all SMILES strings.
>>
>> mols_from_smiles_list is defined here:
>> https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py#L16
>> It just reads the smiles strings in a list into a molecule list. The only
>> API is uses is "Chem.MolFromSmiles".
>>
>> The magic code can be translated into:
>>
>> reactants_smiles, agents_smiles, product_smiles= mols in
>> rxn_string.split('>')
>> package_results = []
>> for mols in reactants_smiles, agents_smiles, product_smiles:
>>   x = mols.split('.')
>>   y = mols_from_smiles_list(x)   # x is a list of SMILES, and y is a
>> list of molecule objects
>>   package_results.append(y)
>> reactants, agents, products = package_results
>>
>> The code now is not cool enough.
>>
>> I have no idea with the second question. May I ask where the
>> parameters threshold_unmapped_reactant_atoms and 
>> move_unmmapped_reactants_to_agents
>> are defined?
>>
>> Best,
>>
>> Hongbin Yang 杨弘宾, Ph.D.
>> Research: Toxicophore and Chemoinformatics
>> On 10/22/2019 13:08,Benjamin Datko<benjamin.datko....@gmail.com>
>> <benjamin.datko....@gmail.com> wrote:
>>
>> Hello all,
>>
>> While reading the source code for ASKCOS (
>> https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py)
>> I noticed this code snippet (line 216 on the GitHub):
>>
>> reactants, agents, products = [mols_from_smiles_list(x) for x in
>> [mols.split('.') for mols in rxn_string.split('>')]]
>>
>> When the above code is applied on a SMILES reaction string, the result
>> unpacks the reactants, agents, and products mol objects into the respected
>> variables, with pretty good accuracy.  The function 'mols_from_smiles'
>> essentially just applies Chem.MolFromSmiles over a list of smiles.
>>
>> I think this code snippet is really cool but I cannot find any
>> documentation on how this is working. Searching this mailing list I came
>> across the thread (
>> https://sourceforge.net/p/rdkit/mailman/message/36316849/) where this
>> operation of labeling reactants, agents, and products seems to be
>> determined by the threshold_unmapped_reactant_atoms explained in the quoted
>> text from the message (linked above)
>>
>> Here's what's going on: By default the cartridge code does an extra step
>>> after reading a reaction from SMILES/SMARTS: it looks at all the reactants
>>> and moves any that don't have a sufficient fraction of mapped atoms to the
>>> agents. We do this by default because the reactions that we found "in the
>>> wild" often have agents, solvents, etc. mixed in with the reactants. The
>>> key parameter used there is threshold_unmapped_reactant_atoms, which
>>> defaults to 0.2.
>>
>>
>> The only further reading I can find is from Greg's paper (
>> https://pubs.acs.org/doi/10.1021/ci5006614). I have two main questions:
>>
>> 1. Where in the code is this atom mapping being applied? I cannot tell
>> when this method is being applied or where the meta data is being saved.
>> Applying the code snippet above to a SMILES reaction string results in a
>> list of rdkit.Chem.rdchem.Mol objects. I cannot seem to find any static
>> method or attributes specifying if it's a reactant, agent, or product when
>> inspecting a mol object using help in a python terminal.
>>
>> 2. How can I change the value of the
>> variables threshold_unmapped_reactant_atoms
>> and move_unmmapped_reactants_to_agents? I am using rdkit version 2019.03.4
>> in an Anaconda environment. I want to experiment changing the mapping
>> threshold.
>>
>> Very Respectfully,
>>
>> Benjamin
>>
>> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to