Hi Benjamin, On Tue, Oct 22, 2019 at 4:32 PM Benjamin Datko <benjamin.datko....@gmail.com> wrote:
> Hi Hongbin, > > Thank you for breaking the code down. I am still new to python and all its > pythonic ways. I did not notice that reactants, agents, and products were > delimited by '>'. After you break it down, the code does lose its magic. =P > Yeah, as long as you have clean reaction data that works great (see below). > > Do happen to have any good references on hand that describes some of the > standards used in RDKit or the Cheminformatics on molecular and reaction > repersentation? > The SMiLES/SMARTS based formats are documented here: https://www.daylight.com/dayhtml/doc/theory/ That's a great place to start. > The second question corresponds to the discussion I found in this thread ( > https://sourceforge.net/p/rdkit/mailman/message/36316849/). I believe > this parameters correspond to the PgSQL RDKit implementation, but I am not > sure. Below I show a recursive search from the downloaded source of RDKit > from GitHub (https://github.com/greglandrum/rdkit). > The method RemoveUnmappedReactantTemplates() on the chemical reaction object is there for a few reasons. The primary one is that "real world" reaction data often isn't 100% clean and can include solvents/reagents in the reactants section (be that in SMILES or RXN files). RemoveUnmappedReactantTemplates() solves that using a simple heuristic: "reactants" that contain more than a threshold percentage of unmapped atoms are either completely removed or marked as agents Hope this helps. -greg > $ pwd > Downloads/rdkit-master > > $ grep -r move_unmmapped_reactants_to_agents . > ./Code/PgSQL/rdkit/guc.c:static bool > *rdkit_move_unmmapped_reactants_to_agents* = true; > ./Code/PgSQL/rdkit/guc.c: "*rdkit.move_unmmapped_reactants_to_agents*", > ./Code/PgSQL/rdkit/guc.c: &*rdkit_move_unmmapped_reactants_to_agents*, > ./Code/PgSQL/rdkit/guc.c: return > *rdkit_move_unmmapped_reactants_to_agents*; > ./Code/PgSQL/rdkit/expected/reaction.out:SET > *rdkit.move_unmmapped_reactants_to_agents*=true; > ./Code/PgSQL/rdkit/expected/reaction.out:SET > *rdkit.move_unmmapped_reactants_to_agents*=false; > ./Code/PgSQL/rdkit/expected/reaction.out:SET > *rdkit.move_unmmapped_reactants_to_agents*=true; > ./Code/PgSQL/rdkit/expected/reaction.out:SET > *rdkit.move_unmmapped_reactants_to_agents*=false; > ./Code/PgSQL/rdkit/expected/reaction.out:SET > *rdkit.move_unmmapped_reactants_to_agents*=true; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET > *rdkit.move_unmmapped_reactants_to_agents*=true; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET > *rdkit.move_unmmapped_reactants_to_agents*=false; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET > *rdkit.move_unmmapped_reactants_to_agents*=true; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET > *rdkit.move_unmmapped_reactants_to_agents*=false; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET > *rdkit.move_unmmapped_reactants_to_agents*=true; > > $ grep -r threshold_unmapped_reactant_atoms . > ./Code/PgSQL/rdkit/guc.c:static double rdkit_ > *threshold_unmapped_reactant_atom*s = 0.2; > ./Code/PgSQL/rdkit/guc.c: "rdkit.*threshold_unmapped_reactant_atoms*", > ./Code/PgSQL/rdkit/guc.c: &*rdkit_threshold_unmapped_reactant_atoms*, > ./Code/PgSQL/rdkit/guc.c: return rdkit_ > *threshold_unmapped_reactant_atoms*; > ./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit. > *threshold_unmapped_reactant_atoms*=0.2; > ./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit. > *threshold_unmapped_reactant_atoms*=0.9; > ./Code/PgSQL/rdkit/expected/reaction.out:SET rdkit. > *threshold_unmapped_reactant_atoms*=0.2; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit. > *threshold_unmapped_reactant_atoms*=0.2; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit. > *threshold_unmapped_reactant_atoms*=0.9; > ./Code/PgSQL/rdkit/sql/reaction.sql:SET rdkit. > *threshold_unmapped_reactant_atoms*=0.2; > > > > On Tue, Oct 22, 2019 at 2:29 AM Hongbin Yang <yanyangh...@163.com> wrote: > >> Hi Benjamin, >> >> The magic code uses a feature of python named "list comprehension". >> https://www.pythonforbeginners.com/basics/list-comprehensions-in-python >> >> It does not read the rxn string directly, but splits the string first. >> Since the reaction string should be `reactants smiles>agents smiles>product >> smiles`, we can get these SMILES strings by "rxn_string.split('>')". >> Then for each part, we can use splitter "." to get single molecules. So >> finally, [mols.split('.') for mols in rxn_string.split('>')] becomes >> [[reactant1, reactant2, ..], [agent1, agent2, ..], [product1, product2, >> ...]]. But they are all SMILES strings. >> >> mols_from_smiles_list is defined here: >> https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py#L16 >> It just reads the smiles strings in a list into a molecule list. The only >> API is uses is "Chem.MolFromSmiles". >> >> The magic code can be translated into: >> >> reactants_smiles, agents_smiles, product_smiles= mols in >> rxn_string.split('>') >> package_results = [] >> for mols in reactants_smiles, agents_smiles, product_smiles: >> x = mols.split('.') >> y = mols_from_smiles_list(x) # x is a list of SMILES, and y is a >> list of molecule objects >> package_results.append(y) >> reactants, agents, products = package_results >> >> The code now is not cool enough. >> >> I have no idea with the second question. May I ask where the >> parameters threshold_unmapped_reactant_atoms and >> move_unmmapped_reactants_to_agents >> are defined? >> >> Best, >> >> Hongbin Yang 杨弘宾, Ph.D. >> Research: Toxicophore and Chemoinformatics >> On 10/22/2019 13:08,Benjamin Datko<benjamin.datko....@gmail.com> >> <benjamin.datko....@gmail.com> wrote: >> >> Hello all, >> >> While reading the source code for ASKCOS ( >> https://github.com/connorcoley/ASKCOS/blob/master/makeit/utilities/io/draw.py) >> I noticed this code snippet (line 216 on the GitHub): >> >> reactants, agents, products = [mols_from_smiles_list(x) for x in >> [mols.split('.') for mols in rxn_string.split('>')]] >> >> When the above code is applied on a SMILES reaction string, the result >> unpacks the reactants, agents, and products mol objects into the respected >> variables, with pretty good accuracy. The function 'mols_from_smiles' >> essentially just applies Chem.MolFromSmiles over a list of smiles. >> >> I think this code snippet is really cool but I cannot find any >> documentation on how this is working. Searching this mailing list I came >> across the thread ( >> https://sourceforge.net/p/rdkit/mailman/message/36316849/) where this >> operation of labeling reactants, agents, and products seems to be >> determined by the threshold_unmapped_reactant_atoms explained in the quoted >> text from the message (linked above) >> >> Here's what's going on: By default the cartridge code does an extra step >>> after reading a reaction from SMILES/SMARTS: it looks at all the reactants >>> and moves any that don't have a sufficient fraction of mapped atoms to the >>> agents. We do this by default because the reactions that we found "in the >>> wild" often have agents, solvents, etc. mixed in with the reactants. The >>> key parameter used there is threshold_unmapped_reactant_atoms, which >>> defaults to 0.2. >> >> >> The only further reading I can find is from Greg's paper ( >> https://pubs.acs.org/doi/10.1021/ci5006614). I have two main questions: >> >> 1. Where in the code is this atom mapping being applied? I cannot tell >> when this method is being applied or where the meta data is being saved. >> Applying the code snippet above to a SMILES reaction string results in a >> list of rdkit.Chem.rdchem.Mol objects. I cannot seem to find any static >> method or attributes specifying if it's a reactant, agent, or product when >> inspecting a mol object using help in a python terminal. >> >> 2. How can I change the value of the >> variables threshold_unmapped_reactant_atoms >> and move_unmmapped_reactants_to_agents? I am using rdkit version 2019.03.4 >> in an Anaconda environment. I want to experiment changing the mapping >> threshold. >> >> Very Respectfully, >> >> Benjamin >> >> _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss