Hi

Some interesting differences in behaviour compared with my RDkit
installation. Using the ChEMBL SMILES (freshly downloaded now) -

[NH-][NH+]=NC[C@H]1O[C@@H]2O[C@@H]3[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]4[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]5[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]6[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]7[C@@H](CN=[N+]=[N-])O[C@H](O[C@@H]8[C@@H](CN=[N+]=[N-])O[C@H](O[C@H]1[C@H](O)[C@H]2O)[C@H](O)[C@H]8O)[C@H](O)[C@H]7O)[C@H](O)[C@H]6O)[C@H](O)[C@H]5O)[C@H](O)[C@H]4O)[C@H](O)[C@H]3O

The problem atoms are the first two. If I convert this to an SD file
(using a C++ program based on the RDkit libraries) then the resulting
SD file contains no charge information in the atom block, it's all in
M CHG records (which are correct) and the problem atoms are still the
first two.

The bonding information is incorrect as there's a single bond between
the two nitrogens and it should be double. Editing the first record in
the bond block from -
  1  2  1  0
to -
  1  2  2  0
fixes the structure for me, and the resulting SD file can be processed
by other RDkit programs. I've attached the resulting file in case it
helps throw any light on what's happening.

I'm puzzled as to why the behaviour is significantly different for you...

Chris

On 5 October 2017 at 15:40, Bennion, Brian <benni...@llnl.gov> wrote:
> The sdf is an rdkit reading of the original smiles string, which if wrong
> would explain the funky charge settings in the mol block for atoms 84 and
> 85.  I modified these to 5 and 3 respectively to make the correct charge
> states, however, that did not resolve the issue.  Perhaps the bonding info
> is also incorrect.  The file is on a remote server so I will repost with
> attachment if I continue to have problems.
>
> Brian
>
>
> ________________________________
> From: Chris Earnshaw <cgearns...@gmail.com>
> Sent: Thursday, October 5, 2017 12:04:02 AM
> To: Bennion, Brian; RDKit Discuss (rdkit-discuss@lists.sourceforge.net)
> Subject: Re: [Rdkit-discuss] nitrogen valence issues
>
> Hi
>
> Be aware that there is a problem with one of the azide groups in
> CHEMBL592333 - in SMILES it's '-N=[NH+]-[NH-]' rather than '-N=[N+]=[N-].
> This doesn't render the structure chemically invalid but it's probably
> wrong.
>
> What's the provenance of your SD file? It isn't the same as as a fresh
> download of this structure from CHEMBL, which can be processed by RDkit
> quite happily (allowing for the structure being wrong!). Is it possible that
> your file has got corrupted by some other processing step?
>
> Regards,
> Chris
>
> On 5 October 2017 at 03:28, Greg Landrum <greg.land...@gmail.com> wrote:
>>
>> Hi Brian,
>>
>> When you pasted that into the email the formatting of the mol block did
>> end up screwed up, which makes this hard to reproduce.
>> Could you please attach the mol block to the message as a file?
>>
>> -greg
>>
>> On Thu, Oct 5, 2017 at 2:21 AM, Bennion, Brian <benni...@llnl.gov> wrote:
>>>
>>> Hello,
>>>
>>> After looking at the email list and seeing that this error has cropped up
>>> several times for aromatic/aliphatic heterocyclic nitrogens I still haven’t
>>> been able to resolve the valence =4 error for one of the azo groups in a
>>> molecule that has 7.  The first couple of azo groups seem to be interpreted
>>> fine.
>>>
>>> Am I doing something incorrect here or is the mol file not formatted
>>> properly?
>>>
>>> Thanks
>>>
>>> Brian
>>>
>>>
>>>
>>>
>>>
>>> [16:50:29] Explicit valence for atom # 85 N, 4, is greater than permitted
>>>
>>> [16:50:29] ERROR: Could not sanitize molecule ending on line 206
>>>
>>> [16:50:29] ERROR: Explicit valence for atom # 85 N, 4, is greater than
>>> permitted
>>>
>>>
>>>
>>> CHEMBL592333
>>>
>>>            3D
>>>
>>>
>>>
>>> 91 98  0  0  0  0  0  0  0  0999 V2000
>>>
>>>     8.3826   -4.1789    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.6967   -2.8968    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.5551   -1.5926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.9817   -1.6449    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.7075   -3.0051    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.8956   -4.2577    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.5145    2.2882    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.8798    0.8284    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.3118    0.3983    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.3905    1.3820    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.0187    2.9205    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.6272    3.3273    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.8204    4.3866    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.9490    5.4779    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.5179    4.0883    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.3822    5.0511    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.6673    2.8820    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.4150    5.6541    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     5.5680    4.4264    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.1262    3.0849    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.0958   -0.9118    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.5054   -4.0612    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.0427   -3.9376    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.8866   -5.1186    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.3028   -6.4902    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.7637   -6.6458    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.9029   -5.4182    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     2.7159   -5.9831    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.6356   -5.6026    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.9599   -4.8851    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.1331    3.3333    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.0734    4.8222    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.8216    2.5373    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.1064   -0.8835    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.6580    0.6255    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.8630   -3.0392    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.1133   -1.9528    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.5294    3.2743    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.7976    5.5400    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.4790    4.7418    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.1940    0.9129    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.2800    2.2297    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.3560   -1.6364    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.7840   -0.1761    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     2.0400   -3.4362    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.6446   -4.7799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.5472   -3.2855    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.7300   -2.5970    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.1699   -5.4891    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.9196    6.9314    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.3718   -2.7438    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.4375   -1.2457    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.2233   -2.9903    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     7.3609   -7.5406    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.9851   -5.4581    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.4420    2.6027    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    13.8111    0.9358    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -3.5451   -1.3353    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     0.5839   -7.0998    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.5449   -5.6259    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.2197   -8.0395    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    13.2649    3.7658    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.8506    5.4331    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.6861    1.6826    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     1.8710    1.0374    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.7424    4.5334    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.1208    4.6949    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     5.0210   -4.9380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.8553   -6.2914    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.6021   -2.6537    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.1193   -2.6926    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -0.3366   -0.2945    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -1.8968   -0.3852    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     8.7855    6.7340    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>     5.3923    6.7330    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.8786    1.4957    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.0747    2.2167    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.4480   -1.8457    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -3.5497   -2.9029    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -2.2517    7.6211    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    -3.5805    8.3170    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>     9.2243    5.6060    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    10.5994    6.3590    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>    13.4890    2.2193    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    15.0494    2.4295    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>    11.6943   -4.1991    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>    12.8782   -5.1236    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>     3.4537   -7.0177    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>     2.4539   -8.1103    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>     4.7687   -0.5531    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
>>>
>>>     6.1360    0.0648    0.0000 N   0  5  0  0  0  0  0  0  0  0  0  0
>>>
>>>   7 12  1  0  0  0  0
>>>
>>> 31 33  1  0  0  0  0
>>>
>>>   9  8  1  0  0  0  0
>>>
>>> 63 50  1  0  0  0  0
>>>
>>>   9 10  1  0  0  0  0
>>>
>>> 45 51  1  0  0  0  0
>>>
>>> 34 35  1  0  0  0  0
>>>
>>> 51 52  1  0  0  0  0
>>>
>>> 10 11  1  0  0  0  0
>>>
>>> 36 37  1  0  0  0  0
>>>
>>> 34 37  1  0  0  0  0
>>>
>>> 38 33  1  0  0  0  0
>>>
>>> 12 11  1  0  0  0  0
>>>
>>> 13 12  1  0  0  0  0
>>>
>>> 38 40  1  0  0  0  0
>>>
>>>   5 53  1  0  0  0  0
>>>
>>> 32 39  1  0  0  0  0
>>>
>>> 25 54  1  0  0  0  0
>>>
>>> 39 40  1  0  0  0  0
>>>
>>> 46 55  1  0  0  0  0
>>>
>>> 35 64  1  0  0  0  0
>>>
>>> 41 35  1  0  0  0  0
>>>
>>> 31 56  1  0  0  0  0
>>>
>>>   5  6  1  0  0  0  0
>>>
>>> 10 57  1  0  0  0  0
>>>
>>> 41 42  1  0  0  0  0
>>>
>>> 34 58  1  0  0  0  0
>>>
>>> 38 42  1  0  0  0  0
>>>
>>> 29 59  1  0  0  0  0
>>>
>>> 37 43  1  0  0  0  0
>>>
>>>   6 60  1  0  0  0  0
>>>
>>>   7  8  1  0  0  0  0
>>>
>>> 26 61  1  0  0  0  0
>>>
>>> 43 44  1  0  0  0  0
>>>
>>> 11 62  1  0  0  0  0
>>>
>>> 41 44  1  0  0  0  0
>>>
>>>   9 21  1  0  0  0  0
>>>
>>> 40 63  1  0  0  0  0
>>>
>>> 22 23  1  0  0  0  0
>>>
>>> 13 15  1  0  0  0  0
>>>
>>> 14 15  1  0  0  0  0
>>>
>>> 33 65  1  0  0  0  0
>>>
>>>   1  2  1  0  0  0  0
>>>
>>> 20 66  1  0  0  0  0
>>>
>>> 15 17  1  0  0  0  0
>>>
>>> 66 67  1  0  0  0  0
>>>
>>> 30 45  1  0  0  0  0
>>>
>>> 23 68  1  0  0  0  0
>>>
>>>   1  6  1  0  0  0  0
>>>
>>> 68 69  1  0  0  0  0
>>>
>>>   2  3  1  0  0  0  0
>>>
>>>   3 70  1  0  0  0  0
>>>
>>> 27 22  1  0  0  0  0
>>>
>>> 70 71  1  0  0  0  0
>>>
>>> 23 24  1  0  0  0  0
>>>
>>> 43 72  1  0  0  0  0
>>>
>>> 45 48  1  0  0  0  0
>>>
>>> 72 73  1  0  0  0  0
>>>
>>> 29 46  1  0  0  0  0
>>>
>>> 46 47  1  0  0  0  0
>>>
>>> 47 48  1  0  0  0  0
>>>
>>> 14 74  1  0  0  0  0
>>>
>>> 36 47  1  0  0  0  0
>>>
>>> 18 75  1  0  0  0  0
>>>
>>> 24 25  1  0  0  0  0
>>>
>>> 25 26  1  0  0  0  0
>>>
>>> 26 27  1  0  0  0  0
>>>
>>>   7 76  1  0  0  0  0
>>>
>>> 28 27  1  0  0  0  0
>>>
>>> 76 77  1  0  0  0  0
>>>
>>>   3  4  1  0  0  0  0
>>>
>>> 73 78  2  3  0  0  0
>>>
>>>   4  5  1  0  0  0  0
>>>
>>> 78 79  2  0  0  0  0
>>>
>>> 17 20  1  0  0  0  0
>>>
>>> 50 80  2  3  0  0  0
>>>
>>> 14 18  1  0  0  0  0
>>>
>>> 80 81  2  0  0  0  0
>>>
>>> 28 30  1  0  0  0  0
>>>
>>> 67 82  2  3  0  0  0
>>>
>>> 29 30  1  0  0  0  0
>>>
>>> 82 83  2  0  0  0  0
>>>
>>> 24 49  1  0  0  0  0
>>>
>>> 77 84  2  3  0  0  0
>>>
>>> 18 19  1  0  0  0  0
>>>
>>> 84 85  2  0  0  0  0
>>>
>>>   4 21  1  0  0  0  0
>>>
>>> 71 86  2  0  0  0  0
>>>
>>>   1 49  1  0  0  0  0
>>>
>>> 86 87  2  0  0  0  0
>>>
>>> 19 20  1  0  0  0  0
>>>
>>> 69 88  2  3  0  0  0
>>>
>>> 31 32  1  0  0  0  0
>>>
>>> 88 89  2  0  0  0  0
>>>
>>> 16 32  1  0  0  0  0
>>>
>>> 52 90  2  3  0  0  0
>>>
>>> 16 19  1  0  0  0  0
>>>
>>> 90 91  2  0  0  0  0
>>>
>>> M  CHG  1  78   1
>>>
>>> M  CHG  1  79  -1
>>>
>>> M  CHG  1  80   1
>>>
>>> M  CHG  1  81  -1
>>>
>>> M  CHG  1  82   1
>>>
>>> M  CHG  1  83  -1
>>>
>>> M  CHG  1  84   1
>>>
>>> M  CHG  1  85  -1
>>>
>>> M  CHG  1  88   1
>>>
>>> M  CHG  1  89  -1
>>>
>>> M  CHG  1  90   1
>>>
>>> M  CHG  1  91  -1
>>>
>>> M  END
>>>
>>> >  <chembl_id>  (540484)
>>>
>>> CHEMBL592333
>>>
>>> $$$$
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>

Attachment: CHEMBL592333_fix.sdf
Description: chemical/mdl-sdfile

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to