Re: [Rdkit-discuss] Implicit Hydrogens On Aromatic Hetereoatoms

2017-10-06 Thread Chris Murphy
Greg,

That fixed it! Thanks so much, that makes a lot more sense now.

-Chris

On Fri, Oct 6, 2017 at 1:25 AM, Greg Landrum  wrote:

> Hi Chris,
>
> There's an additional step performed during sanitization that recognizes
> that the implicit H needs to be on the N. The steps of a normal full
> molecular sanitization operation are documented here:
> http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
>
> The adjustHs() function is not exposed directly to Python, but you can
> take care of aromaticity assignment and adjustHs in a single call with the
> SanitizeMol() function:
>
> In [21]: m = Chem.MolFromMolBlock(mb,sanitize=False)
>
> In [22]: Chem.SanitizeMol(m,sanitizeOps=Chem.SANITIZE_SETAROMATICITY|
> Chem.SANITIZE_ADJUSTHS)
> Out[22]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>
> In [23]: Chem.MolToSmiles(m)
> Out[23]: 'CC(C)c12c(Cl)cc(-c3ccc(-c4ccc(C(=O)O)cc4)[nH]3)nc12'
>
> I highlighted the N in the heterocyle with the implicit H.
>
> Hopefully this helps.
>
> Best,
> -greg
> p.s. Note: while testing parts of this answer I uncovered a bug in
> AdjustHs() that causes it to fail for molecules that include atoms with
> "bad valences": https://github.com/rdkit/rdkit/issues/1605 This looks
> like it should be easy to fix for the upcoming release.
>
>
> On Thu, Oct 5, 2017 at 11:06 PM, Chris Murphy <
> chris.mur...@schrodinger.com> wrote:
>
>> Hi!
>>
>> I'm running at an issue with implicit hydrogens on aromatic heteroatoms.
>> I am feeding the following sdf into a mol object:
>>
>>
>>   Mrv16c5 10021719092D
>>
>>  28 31  0  0  0  0999 V2000
>>-2.90000.20760. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-2.18710.62280. N   0  0  0  0  0  0  0  0  0  0  0  0
>> 0.00630.29570. N   0  0  0  0  0  0  0  0  0  0  0  0
>>-0.74860.63540. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-1.46570.21180. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 0.55990.91640. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-2.9000   -0.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-2.1829   -1.03800. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-1.4657   -0.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-3.62140.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-0.67311.45740. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 0.14051.63350. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 3.86460.59760. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.38600.83670. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 3.03840.67310. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 4.34691.26870. O   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.72570.08180. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.86411.51610. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 2.68611.43220. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 2.5477   -0.00210. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-2.1829   -1.86830. Cl  0  0  0  0  0  0  0  0  0  0  0  0
>>-3.62971.45320. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 4.2085   -0.16560. O   0  0  0  0  0  0  0  0  0  0  0  0
>>-3.6214   -1.04630. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-4.33850.19920. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-4.3385   -0.63110. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-4.34691.86000. C   0  0  0  0  0  0  0  0  0  0  0  0
>>-2.90421.86830. C   0  0  0  0  0  0  0  0  0  0  0  0
>>   2  1  1  0  0  0  0
>>   3  4  1  0  0  0  0
>>   4  5  1  0  0  0  0
>>   5  2  2  0  0  0  0
>>   6  3  1  0  0  0  0
>>   7  1  1  0  0  0  0
>>   8  7  1  0  0  0  0
>>   9  8  2  0  0  0  0
>>  10  1  2  0  0  0  0
>>  11  4  2  0  0  0  0
>>  12 11  1  0  0  0  0
>>  13 15  1  0  0  0  0
>>  14  6  1  0  0  0  0
>>  15 19  1  0  0  0  0
>>  16 13  2  0  0  0  0
>>  17 14  2  0  0  0  0
>>  18 14  1  0  0  0  0
>>  19 18  2  0  0  0  0
>>  20 17  1  0  0  0  0
>>  21  8  1  0  0  0  0
>>  22 10  1  0  0  0  0
>>  23 13  1  0  0  0  0
>>  24  7  2  0  0  0  0
>>  25 10  1  0  0  0  0
>>  26 25  2  0  0  0  0
>>  27 22  1  0  0  0  0
>>  28 22  1  0  0  0  0
>>  24 26  1  0  0  0  0
>>   9  5  1  0  0  0  0
>>   6 12  2  0  0  0  0
>>  20 15  2  0  0  0  0
>> M  END
>> 
>>
>> The nitrogen in the heterocycle should have 1 implicit hydrogen on it,
>> and when I look at it after initially creating the mol object, it does. I
>> want to convert it to aromatic form, so I am calling
>> rdmolops.SetAromatize(mol). Once I do this however, it seems that the
>> implicit hydrogen on the nitrogen is removed, which then causes an error to
>> be thrown if I ever try to convert it back to kekule form or do any kind of
>> sanitization. Maybe my understanding of aromaticity is wrong, but shouldn't
>> the 

Re: [Rdkit-discuss] Implicit Hydrogens On Aromatic Hetereoatoms

2017-10-05 Thread Greg Landrum
Hi Chris,

There's an additional step performed during sanitization that recognizes
that the implicit H needs to be on the N. The steps of a normal full
molecular sanitization operation are documented here:
http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization

The adjustHs() function is not exposed directly to Python, but you can take
care of aromaticity assignment and adjustHs in a single call with the
SanitizeMol() function:

In [21]: m = Chem.MolFromMolBlock(mb,sanitize=False)

In [22]: Chem.SanitizeMol(m,sanitizeOps=Chem.SANITIZE_
SETAROMATICITY|Chem.SANITIZE_ADJUSTHS)
Out[22]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

In [23]: Chem.MolToSmiles(m)
Out[23]: 'CC(C)c12c(Cl)cc(-c3ccc(-c4ccc(C(=O)O)cc4)[nH]3)nc12'

I highlighted the N in the heterocyle with the implicit H.

Hopefully this helps.

Best,
-greg
p.s. Note: while testing parts of this answer I uncovered a bug in
AdjustHs() that causes it to fail for molecules that include atoms with
"bad valences": https://github.com/rdkit/rdkit/issues/1605 This looks like
it should be easy to fix for the upcoming release.


On Thu, Oct 5, 2017 at 11:06 PM, Chris Murphy 
wrote:

> Hi!
>
> I'm running at an issue with implicit hydrogens on aromatic heteroatoms. I
> am feeding the following sdf into a mol object:
>
>
>   Mrv16c5 10021719092D
>
>  28 31  0  0  0  0999 V2000
>-2.90000.20760. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.18710.62280. N   0  0  0  0  0  0  0  0  0  0  0  0
> 0.00630.29570. N   0  0  0  0  0  0  0  0  0  0  0  0
>-0.74860.63540. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.46570.21180. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.55990.91640. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.9000   -0.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.1829   -1.03800. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.4657   -0.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
>-3.62140.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.67311.45740. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.14051.63350. C   0  0  0  0  0  0  0  0  0  0  0  0
> 3.86460.59760. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.38600.83670. C   0  0  0  0  0  0  0  0  0  0  0  0
> 3.03840.67310. C   0  0  0  0  0  0  0  0  0  0  0  0
> 4.34691.26870. O   0  0  0  0  0  0  0  0  0  0  0  0
> 1.72570.08180. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.86411.51610. C   0  0  0  0  0  0  0  0  0  0  0  0
> 2.68611.43220. C   0  0  0  0  0  0  0  0  0  0  0  0
> 2.5477   -0.00210. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.1829   -1.86830. Cl  0  0  0  0  0  0  0  0  0  0  0  0
>-3.62971.45320. C   0  0  0  0  0  0  0  0  0  0  0  0
> 4.2085   -0.16560. O   0  0  0  0  0  0  0  0  0  0  0  0
>-3.6214   -1.04630. C   0  0  0  0  0  0  0  0  0  0  0  0
>-4.33850.19920. C   0  0  0  0  0  0  0  0  0  0  0  0
>-4.3385   -0.63110. C   0  0  0  0  0  0  0  0  0  0  0  0
>-4.34691.86000. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.90421.86830. C   0  0  0  0  0  0  0  0  0  0  0  0
>   2  1  1  0  0  0  0
>   3  4  1  0  0  0  0
>   4  5  1  0  0  0  0
>   5  2  2  0  0  0  0
>   6  3  1  0  0  0  0
>   7  1  1  0  0  0  0
>   8  7  1  0  0  0  0
>   9  8  2  0  0  0  0
>  10  1  2  0  0  0  0
>  11  4  2  0  0  0  0
>  12 11  1  0  0  0  0
>  13 15  1  0  0  0  0
>  14  6  1  0  0  0  0
>  15 19  1  0  0  0  0
>  16 13  2  0  0  0  0
>  17 14  2  0  0  0  0
>  18 14  1  0  0  0  0
>  19 18  2  0  0  0  0
>  20 17  1  0  0  0  0
>  21  8  1  0  0  0  0
>  22 10  1  0  0  0  0
>  23 13  1  0  0  0  0
>  24  7  2  0  0  0  0
>  25 10  1  0  0  0  0
>  26 25  2  0  0  0  0
>  27 22  1  0  0  0  0
>  28 22  1  0  0  0  0
>  24 26  1  0  0  0  0
>   9  5  1  0  0  0  0
>   6 12  2  0  0  0  0
>  20 15  2  0  0  0  0
> M  END
> 
>
> The nitrogen in the heterocycle should have 1 implicit hydrogen on it, and
> when I look at it after initially creating the mol object, it does. I want
> to convert it to aromatic form, so I am calling rdmolops.SetAromatize(mol).
> Once I do this however, it seems that the implicit hydrogen on the nitrogen
> is removed, which then causes an error to be thrown if I ever try to
> convert it back to kekule form or do any kind of sanitization. Maybe my
> understanding of aromaticity is wrong, but shouldn't the hydrogen be on the
> nitrogen regardless of whether or not it is considered to be in an aromatic
> form?
>
> I could be misunderstanding rdkit's aromaticity models, but for my
> purposes, I want to be able to convert the mol to either kekule or aromatic
> form depending on some configuration settings. 

[Rdkit-discuss] Implicit Hydrogens On Aromatic Hetereoatoms

2017-10-05 Thread Chris Murphy
Hi!

I'm running at an issue with implicit hydrogens on aromatic heteroatoms. I
am feeding the following sdf into a mol object:


  Mrv16c5 10021719092D

 28 31  0  0  0  0999 V2000
   -2.90000.20760. C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.18710.62280. N   0  0  0  0  0  0  0  0  0  0  0  0
0.00630.29570. N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.74860.63540. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.46570.21180. C   0  0  0  0  0  0  0  0  0  0  0  0
0.55990.91640. C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.9000   -0.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1829   -1.03800. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4657   -0.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.62140.62280. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.67311.45740. C   0  0  0  0  0  0  0  0  0  0  0  0
0.14051.63350. C   0  0  0  0  0  0  0  0  0  0  0  0
3.86460.59760. C   0  0  0  0  0  0  0  0  0  0  0  0
1.38600.83670. C   0  0  0  0  0  0  0  0  0  0  0  0
3.03840.67310. C   0  0  0  0  0  0  0  0  0  0  0  0
4.34691.26870. O   0  0  0  0  0  0  0  0  0  0  0  0
1.72570.08180. C   0  0  0  0  0  0  0  0  0  0  0  0
1.86411.51610. C   0  0  0  0  0  0  0  0  0  0  0  0
2.68611.43220. C   0  0  0  0  0  0  0  0  0  0  0  0
2.5477   -0.00210. C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1829   -1.86830. Cl  0  0  0  0  0  0  0  0  0  0  0  0
   -3.62971.45320. C   0  0  0  0  0  0  0  0  0  0  0  0
4.2085   -0.16560. O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.6214   -1.04630. C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.33850.19920. C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.3385   -0.63110. C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.34691.86000. C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.90421.86830. C   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  1  0  0  0  0
  5  2  2  0  0  0  0
  6  3  1  0  0  0  0
  7  1  1  0  0  0  0
  8  7  1  0  0  0  0
  9  8  2  0  0  0  0
 10  1  2  0  0  0  0
 11  4  2  0  0  0  0
 12 11  1  0  0  0  0
 13 15  1  0  0  0  0
 14  6  1  0  0  0  0
 15 19  1  0  0  0  0
 16 13  2  0  0  0  0
 17 14  2  0  0  0  0
 18 14  1  0  0  0  0
 19 18  2  0  0  0  0
 20 17  1  0  0  0  0
 21  8  1  0  0  0  0
 22 10  1  0  0  0  0
 23 13  1  0  0  0  0
 24  7  2  0  0  0  0
 25 10  1  0  0  0  0
 26 25  2  0  0  0  0
 27 22  1  0  0  0  0
 28 22  1  0  0  0  0
 24 26  1  0  0  0  0
  9  5  1  0  0  0  0
  6 12  2  0  0  0  0
 20 15  2  0  0  0  0
M  END


The nitrogen in the heterocycle should have 1 implicit hydrogen on it, and
when I look at it after initially creating the mol object, it does. I want
to convert it to aromatic form, so I am calling rdmolops.SetAromatize(mol).
Once I do this however, it seems that the implicit hydrogen on the nitrogen
is removed, which then causes an error to be thrown if I ever try to
convert it back to kekule form or do any kind of sanitization. Maybe my
understanding of aromaticity is wrong, but shouldn't the hydrogen be on the
nitrogen regardless of whether or not it is considered to be in an aromatic
form?

I could be misunderstanding rdkit's aromaticity models, but for my
purposes, I want to be able to convert the mol to either kekule or aromatic
form depending on some configuration settings. Is there a way to manipulate
the hydogrens on an atom in this case?

Thanks!
Chris
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss