Hi Andrew,
There's definitely something odd going on here, and I will make a longer
reply once I figure out exactly what it is, but in the meantime a question,
a workaround, and a comment:
1) in the code you have this snippet:
# This gives: c1ccc(nc1)-n1ncc2ccc(nc21)C1CC1
# That SMILES appears to be incorrect!
Why do you think that's true?
2) If you add a call to Chem.SanitizeMol(hydrogren_mol) before any of the
calls to SMILES generation, it clears up the problem. The calls to
SetNumExplicitHs() are not neccesary.
3) I suspect that you should be using Chem.FragmentOnBonds(). It's likely
more efficient than what you're currently doing.
-greg
On Tue, Feb 2, 2016 at 7:49 AM, Andrew Dalke <[email protected]>
wrote:
> Hi all,
>
> I have a problem that I think is due to my not understanding how to work
> with explicit (or perhaps implicit) hydrogens.
>
> In my project, I want to find the core of a molecule as well as its
> R-groups. I use a SMARTS pattern to find the bonds to cut, then want to
> store two versions of the core:
>
> 1) replace the R-groups with hydrogens, and store the canonical SMILES
> for the hydrogenated core. I do this by adding explicit hydrogens to each
> of the atoms where a bond was cut,
>
> 2) add "-[*]" extensions on each of the atoms where a bond was cut, and
> store its canonical SMILES
>
> For example, if the structure is "OCCN" with a core of "CC" and the two
> R-groups "O" and "N", then the two versions might be (1) "CC" and (2)
> "[*]CC[*]".
>
>
> To make sure I did things correctly, I used string substitution to replace
> the "[*]" terms from form (2) with "[H]", then parsed and regenerated the
> SMILES to create (2H). This should generate the identical output as (1).
>
> Unfortunately, it doesn't. But if I take the output from (1) and
> reparse/regenerate the SMILES string, then while I expect the SMILES to be
> unchanged, it actually produces a new SMILES (1'), which is identical to
> (2H).
>
>
> I've attached a reproducible. Here's the output:
>
> Using RDKit version 2016.03.1.dev1
> == Increase explicit hydrogen count & make SMILES / reperceive & make
> SMILES ==
> hydrogen smiles: c1ccc(nc1)-n1ncc2ccc(nc21)C1CC1
> reparsed h-smiles: c1ccc(-n2ncc3ccc(C4CC4)nc32)nc1
>
> == Add '*' atoms & make SMILES / *->'H' / reperceive & make SMILES ==
> star smiles: [*]c1cc(nc2c1c([*])nn2-c1ccccn1)C1CC1
> substituted smiles: [H]c1cc(nc2c1c([H])nn2-c1ccccn1)C1CC1
> absorbed smiles: c1ccc(-n2ncc3ccc(C4CC4)nc32)nc1
>
> In this output, the "hydrogen smiles" is (1), which I generated by adding
> explicit hydrogens, and the "reparsed h-smiles" is (1'), which I generate
> by reparsing/canonicalizing (1).
>
> While the "star smiles" is the is (2), where the "substituted smiles" is
> the text replacement of "*" with "H". Finally, I parse the substituted
> smiles and generate the final canonical SMILES, which I term the "absorbed
> smiles" because the SMILES parser by default places the '[H]' terms on the
> parent atom. This is (2H) in my above notation.
>
> I expected the "hydrogen smiles" to be the same as the "reparsed h-smiles"
> and the "absorbed smiles", but it is not.
>
> I can't figure out what I did wrong.
>
> As a work-around, I could reperceive my (1) to get (1'), or use the *->H
> technique to get (2H). However, that seems inelegant, and parsing the
> SMILES string has a high overhead so this will substantially lower my
> performance.
>
>
> Andrew
> [email protected]
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss