Re: [Rdkit-discuss] can't kekulize molecule

2017-08-17 Thread Greg Landrum
The two primary functions that handle the current atom type semantics
are cleanUpMol2Substructures() and guessFormalCharges() here:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/Mol2FileParser.cpp


On Wed, Aug 16, 2017 at 8:00 PM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

> On 08/16/2017 06:14 PM, Greg Landrum wrote:
>
>>
>> On Wed, Aug 16, 2017 at 3:55 AM, Francois BERENGER <
>> beren...@bioreg.kyushu-u.ac.jp >
>> wrote:
>>
>> On 08/16/2017 03:36 PM, Greg Landrum wrote:
>>
>> The RDKit Mol2 parser is really only validated for the atom
>> types generated by corina. I'm not surprised that the ouput from
>> open babel would not be understood. This is documented:
>> http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#
>> MolFromMol2File
>> > #MolFromMol2File>
>>
>> It would be really nice if open babel MOL2 output could directly be
>> read
>> in by rdkit.
>>
>> Adding this support is not an impossible task for someone who understands
>> the open babel interpretation of the Mol2 atom types. Nik's code for
>> dealing with the cleanup of the corina atom types is quite well documented
>> and creating a bunch of test cases using OpenBabel would be pretty
>> straightforward. It would take time and care though.
>>
>
> Can you point out that code?
>
> I may have a look one day.
>
>
> I'd guess that in the end it's easier and more straightforward to just let
>> open babel do the translation.
>>
>> I often find myself running
>> $ obabel in.mol2 -O out.sdf
>> just for that purpose.
>>
>>
>> The question I always end up asking here is: Why do you have open babel
>> mol2 files in the first place?
>> If you're reading those into another piece of software (the usual
>> answer): are you sure that the other software and open babel interpret the
>> atom types the same way? Really sure?
>>
>> -greg
>>
>>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-17 Thread Greg Landrum
On Wed, Aug 16, 2017 at 7:14 PM, David Liu  wrote:
>
>
> Thanks a lot for your reply! It makes sense to me. I wonder if there is a
> plan for rdkit to generate the mol2 file directly from the rdkit molecule
> object?
>

There is a pull request active for a python-based mol2 writer:
https://github.com/rdkit/rdkit/pull/415
Maciej made a lot of progress and then things got stuck; likely due to
someone running out of time.

I just quickly looked through the PR and it looks like it's pretty close to
being useable (at least as a first version), but it will require someone to
invest some time in the last tuning and testing.

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread Francois BERENGER

On 08/16/2017 06:14 PM, Greg Landrum wrote:


On Wed, Aug 16, 2017 at 3:55 AM, Francois BERENGER 
> 
wrote:


On 08/16/2017 03:36 PM, Greg Landrum wrote:

The RDKit Mol2 parser is really only validated for the atom
types generated by corina. I'm not surprised that the ouput from
open babel would not be understood. This is documented:

http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File



It would be really nice if open babel MOL2 output could directly be read
in by rdkit.

Adding this support is not an impossible task for someone who 
understands the open babel interpretation of the Mol2 atom types. Nik's 
code for dealing with the cleanup of the corina atom types is quite well 
documented and creating a bunch of test cases using OpenBabel would be 
pretty straightforward. It would take time and care though.


Can you point out that code?

I may have a look one day.

I'd guess that in the end it's easier and more straightforward to just 
let open babel do the translation.


I often find myself running
$ obabel in.mol2 -O out.sdf
just for that purpose.


The question I always end up asking here is: Why do you have open babel 
mol2 files in the first place?
If you're reading those into another piece of software (the usual 
answer): are you sure that the other software and open babel interpret 
the atom types the same way? Really sure?


-greg



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread David Liu
Hi Greg,

Thanks a lot for your reply! It makes sense to me. I wonder if there is a
plan for rdkit to generate the mol2 file directly from the rdkit molecule
object?

Thanks,
Shuai

On Wed, Aug 16, 2017 at 5:14 AM, Greg Landrum 
wrote:

>
>
> On Wed, Aug 16, 2017 at 3:55 AM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
>
>> On 08/16/2017 03:36 PM, Greg Landrum wrote:
>>
>>>
>>> The RDKit Mol2 parser is really only validated for the atom types
>>> generated by corina. I'm not surprised that the ouput from open babel would
>>> not be understood. This is documented:
>>> http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#
>>> MolFromMol2File
>>>
>>
>> It would be really nice if open babel MOL2 output could directly be read
>> in by rdkit.
>>
>
> Adding this support is not an impossible task for someone who understands
> the open babel interpretation of the Mol2 atom types. Nik's code for
> dealing with the cleanup of the corina atom types is quite well documented
> and creating a bunch of test cases using OpenBabel would be pretty
> straightforward. It would take time and care though.
>
> I'd guess that in the end it's easier and more straightforward to just let
> open babel do the translation.
>
>
>> I often find myself running
>> $ obabel in.mol2 -O out.sdf
>> just for that purpose.
>>
>
> The question I always end up asking here is: Why do you have open babel
> mol2 files in the first place?
> If you're reading those into another piece of software (the usual answer):
> are you sure that the other software and open babel interpret the atom
> types the same way? Really sure?
>
> -greg
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread Greg Landrum
On Wed, Aug 16, 2017 at 3:55 AM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

> On 08/16/2017 03:36 PM, Greg Landrum wrote:
>
>>
>> The RDKit Mol2 parser is really only validated for the atom types
>> generated by corina. I'm not surprised that the ouput from open babel would
>> not be understood. This is documented:
>> http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#
>> MolFromMol2File
>>
>
> It would be really nice if open babel MOL2 output could directly be read
> in by rdkit.
>

Adding this support is not an impossible task for someone who understands
the open babel interpretation of the Mol2 atom types. Nik's code for
dealing with the cleanup of the corina atom types is quite well documented
and creating a bunch of test cases using OpenBabel would be pretty
straightforward. It would take time and care though.

I'd guess that in the end it's easier and more straightforward to just let
open babel do the translation.


> I often find myself running
> $ obabel in.mol2 -O out.sdf
> just for that purpose.
>

The question I always end up asking here is: Why do you have open babel
mol2 files in the first place?
If you're reading those into another piece of software (the usual answer):
are you sure that the other software and open babel interpret the atom
types the same way? Really sure?

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread Francois BERENGER

On 08/16/2017 03:36 PM, Greg Landrum wrote:

Hi Shuai,

The RDKit Mol2 parser is really only validated for the atom types 
generated by corina. I'm not surprised that the ouput from open babel 
would not be understood. This is documented:

http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File


It would be really nice if open babel MOL2 output could directly be read
in by rdkit.

I often find myself running
$ obabel in.mol2 -O out.sdf
just for that purpose.

An aside: If you have an SDF file you can read that directly into the 
RDKit. It seems like you shouldn't need the openbabel translation step 
at all.


-greg


On Wed, Aug 16, 2017 at 12:13 AM, David Liu > wrote:


Dear all,

I have troubles to kekulize molecule using rdkit, below is an example:

The example.mol2 file looks like below:

@MOLECULE
example
46 49 0 0 0
SMALL
GASTEIGER

@ATOM
1 C -4.5556 -0.2844 1.1718 C.3 1 LIG1 -0.0109
2 C -6.0291 -0.7271 1.2334 C.3 1 LIG1 0.0493
3 C -6.4413 -0.5958 -1.0493 C.3 1 LIG1 0.0493
4 C -5.1977 0.3130 -1.1927 C.3 1 LIG1 -0.0109
5 C 5.5992 -2.5640 -0.8780 C.ar 1 LIG1 -0.0253
6 O -6.3822 -1.4588 0.0764 O.3 1 LIG1 -0.3796
7 C 2.8943 1.6722 0.9911 C.ar 1 LIG1 0.2664
8 C 5.1745 -2.0407 0.3480 C.ar 1 LIG1 0.1371
9 C -1.6179 0.4017 0.1577 C.ar 1 LIG1 0.2173
10 C -4.0573 -0.1702 -0.2838 C.3 1 LIG1 0.0275
11 C 0.8767 -0.2307 1.1489 C.ar 1 LIG1 0.0370
12 C 2.1438 -0.5325 1.6439 C.ar 1 LIG1 -0.0306
13 C 6.1958 -1.7294 -1.8279 C.ar 1 LIG1 -0.0590
14 C 6.3717 -0.3702 -1.5525 C.ar 1 LIG1 -0.0605
15 C 5.9487 0.1564 -0.3282 C.ar 1 LIG1 -0.0452
16 C 0.6358 1.0320 0.5744 C.ar 1 LIG1 0.1483
17 C -0.1716 -1.1537 1.2042 C.ar 1 LIG1 0.0418
18 C 3.1618 0.4153 1.5592 C.ar 1 LIG1 0.0780
19 C 5.3424 -0.6749 0.6231 C.ar 1 LIG1 0.0480
20 C 1.3530 3.2786 -0.1013 C.3 1 LIG1 0.0167
21 F 4.6032 -2.8623 1.2640 F 1 LIG1 -0.2043
22 S 4.7969 0.0115 2.1898 S.3 1 LIG1 -0.0812
23 N -1.3906 -0.8211 0.7091 N.ar 1 LIG1 -0.
24 O 3.8206 2.5277 0.9363 O.2 1 LIG1 -0.2664
25 N 1.6412 1.9659 0.5033 N.ar 1 LIG1 -0.2949
26 N -0.6088 1.3106 0.0937 N.ar 1 LIG1 -0.1964
27 N -2.9091 0.7394 -0.3655 N.pl3 1 LIG1 -0.3104
28 H -3.9262 -1.0225 1.7144 H 1 LIG1 0.0305
29 H -4.4544 0.6942 1.6907 H 1 LIG1 0.0305
30 H -6.1785 -1.3738 2.1237 H 1 LIG1 0.0560
31 H -6.6965 0.1565 1.3647 H 1 LIG1 0.0560
32 H -7.3658 0.0220 -1.0063 H 1 LIG1 0.0560
33 H -6.5227 -1.2302 -1.9574 H 1 LIG1 0.0560
34 H -4.8575 0.3261 -2.2513 H 1 LIG1 0.0305
35 H -5.4753 1.3532 -0.9112 H 1 LIG1 0.0305
36 H 5.4676 -3.6168 -1.0922 H 1 LIG1 0.0646
37 H -3.7461 -1.1771 -0.6436 H 1 LIG1 0.0500
38 H 2.3428 -1.4998 2.0895 H 1 LIG1 0.0638
39 H 6.5237 -2.1362 -2.7758 H 1 LIG1 0.0618
40 H 6.8363 0.2748 -2.2870 H 1 LIG1 0.0618
41 H 6.0904 1.2094 -0.1219 H 1 LIG1 0.0630
42 H -0.0243 -2.1352 1.6372 H 1 LIG1 0.0838
43 H 2.2342 3.9528 -0.1073 H 1 LIG1 0.0457
44 H 0.5450 3.7853 0.4685 H 1 LIG1 0.0457
45 H 1.0258 3.1432 -1.1544 H 1 LIG1 0.0457
46 H -3.0166 1.6655 -0.8392 H 1 LIG1 0.1492
@BOND
1 1 2 1
2 1 10 1
3 2 6 1
4 3 4 1
5 3 6 1
6 4 10 1
7 5 8 ar
8 5 13 ar
9 7 18 ar
10 7 24 2
11 7 25 ar
12 8 19 ar
13 8 21 1
14 9 23 ar
15 9 26 ar
16 9 27 1
17 10 27 1
18 11 12 ar
19 11 16 ar
20 11 17 ar
21 12 18 ar
22 13 14 ar
23 14 15 ar
24 15 19 ar
25 16 25 ar
26 16 26 ar
27 17 23 ar
28 18 22 1
29 19 22 1
30 20 25 1
31 1 28 1
32 1 29 1
33 2 30 1
34 2 31 1
35 3 32 1
36 3 33 1
37 4 34 1
38 4 35 1
39 5 36 1
40 10 37 1
41 12 38 1
42 13 39 1
43 14 40 1
44 15 41 1
45 17 42 1
46 20 43 1
47 20 44 1
48 20 45 1
49 27 46 1

And the example.py code looks like

from rdkit.Chem import AllChem
from rdkit import Chem

rdkit_mol = Chem.MolFromMol2File("example.mol2", sanitize=False,
removeHs=False)
mol = AllChem.RemoveHs(rdkit_mol)

If running the example.py, it returns an error as below:

ValueError: Sanitization error: Can't kekulize mol. Unkekulized
atoms: 8 10 11 15 16 17 22 24 25

It seems rdkit cannot understand the molecules when it try to remove
the hydrogens, probably related to the format of the mol2 file I
used here? I use openbabel to convert the mol2 file from an sdf
file. So I wonder if there is a plan to parse the mol2 file like
this or I need to further cook the mol2 file. I appreciate for any
advices!


Thanks,

Shuai



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread Greg Landrum
Hi Shuai,

The RDKit Mol2 parser is really only validated for the atom types generated
by corina. I'm not surprised that the ouput from open babel would not be
understood. This is documented:
http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File

An aside: If you have an SDF file you can read that directly into the
RDKit. It seems like you shouldn't need the openbabel translation step at
all.

-greg


On Wed, Aug 16, 2017 at 12:13 AM, David Liu  wrote:

> Dear all,
>
> I have troubles to kekulize molecule using rdkit, below is an example:
>
> The example.mol2 file looks like below:
>
> @MOLECULE
> example
> 46 49 0 0 0
> SMALL
> GASTEIGER
>
> @ATOM
> 1 C -4.5556 -0.2844 1.1718 C.3 1 LIG1 -0.0109
> 2 C -6.0291 -0.7271 1.2334 C.3 1 LIG1 0.0493
> 3 C -6.4413 -0.5958 -1.0493 C.3 1 LIG1 0.0493
> 4 C -5.1977 0.3130 -1.1927 C.3 1 LIG1 -0.0109
> 5 C 5.5992 -2.5640 -0.8780 C.ar 1 LIG1 -0.0253
> 6 O -6.3822 -1.4588 0.0764 O.3 1 LIG1 -0.3796
> 7 C 2.8943 1.6722 0.9911 C.ar 1 LIG1 0.2664
> 8 C 5.1745 -2.0407 0.3480 C.ar 1 LIG1 0.1371
> 9 C -1.6179 0.4017 0.1577 C.ar 1 LIG1 0.2173
> 10 C -4.0573 -0.1702 -0.2838 C.3 1 LIG1 0.0275
> 11 C 0.8767 -0.2307 1.1489 C.ar 1 LIG1 0.0370
> 12 C 2.1438 -0.5325 1.6439 C.ar 1 LIG1 -0.0306
> 13 C 6.1958 -1.7294 -1.8279 C.ar 1 LIG1 -0.0590
> 14 C 6.3717 -0.3702 -1.5525 C.ar 1 LIG1 -0.0605
> 15 C 5.9487 0.1564 -0.3282 C.ar 1 LIG1 -0.0452
> 16 C 0.6358 1.0320 0.5744 C.ar 1 LIG1 0.1483
> 17 C -0.1716 -1.1537 1.2042 C.ar 1 LIG1 0.0418
> 18 C 3.1618 0.4153 1.5592 C.ar 1 LIG1 0.0780
> 19 C 5.3424 -0.6749 0.6231 C.ar 1 LIG1 0.0480
> 20 C 1.3530 3.2786 -0.1013 C.3 1 LIG1 0.0167
> 21 F 4.6032 -2.8623 1.2640 F 1 LIG1 -0.2043
> 22 S 4.7969 0.0115 2.1898 S.3 1 LIG1 -0.0812
> 23 N -1.3906 -0.8211 0.7091 N.ar 1 LIG1 -0.
> 24 O 3.8206 2.5277 0.9363 O.2 1 LIG1 -0.2664
> 25 N 1.6412 1.9659 0.5033 N.ar 1 LIG1 -0.2949
> 26 N -0.6088 1.3106 0.0937 N.ar 1 LIG1 -0.1964
> 27 N -2.9091 0.7394 -0.3655 N.pl3 1 LIG1 -0.3104
> 28 H -3.9262 -1.0225 1.7144 H 1 LIG1 0.0305
> 29 H -4.4544 0.6942 1.6907 H 1 LIG1 0.0305
> 30 H -6.1785 -1.3738 2.1237 H 1 LIG1 0.0560
> 31 H -6.6965 0.1565 1.3647 H 1 LIG1 0.0560
> 32 H -7.3658 0.0220 -1.0063 H 1 LIG1 0.0560
> 33 H -6.5227 -1.2302 -1.9574 H 1 LIG1 0.0560
> 34 H -4.8575 0.3261 -2.2513 H 1 LIG1 0.0305
> 35 H -5.4753 1.3532 -0.9112 H 1 LIG1 0.0305
> 36 H 5.4676 -3.6168 -1.0922 H 1 LIG1 0.0646
> 37 H -3.7461 -1.1771 -0.6436 H 1 LIG1 0.0500
> 38 H 2.3428 -1.4998 2.0895 H 1 LIG1 0.0638
> 39 H 6.5237 -2.1362 -2.7758 H 1 LIG1 0.0618
> 40 H 6.8363 0.2748 -2.2870 H 1 LIG1 0.0618
> 41 H 6.0904 1.2094 -0.1219 H 1 LIG1 0.0630
> 42 H -0.0243 -2.1352 1.6372 H 1 LIG1 0.0838
> 43 H 2.2342 3.9528 -0.1073 H 1 LIG1 0.0457
> 44 H 0.5450 3.7853 0.4685 H 1 LIG1 0.0457
> 45 H 1.0258 3.1432 -1.1544 H 1 LIG1 0.0457
> 46 H -3.0166 1.6655 -0.8392 H 1 LIG1 0.1492
> @BOND
> 1 1 2 1
> 2 1 10 1
> 3 2 6 1
> 4 3 4 1
> 5 3 6 1
> 6 4 10 1
> 7 5 8 ar
> 8 5 13 ar
> 9 7 18 ar
> 10 7 24 2
> 11 7 25 ar
> 12 8 19 ar
> 13 8 21 1
> 14 9 23 ar
> 15 9 26 ar
> 16 9 27 1
> 17 10 27 1
> 18 11 12 ar
> 19 11 16 ar
> 20 11 17 ar
> 21 12 18 ar
> 22 13 14 ar
> 23 14 15 ar
> 24 15 19 ar
> 25 16 25 ar
> 26 16 26 ar
> 27 17 23 ar
> 28 18 22 1
> 29 19 22 1
> 30 20 25 1
> 31 1 28 1
> 32 1 29 1
> 33 2 30 1
> 34 2 31 1
> 35 3 32 1
> 36 3 33 1
> 37 4 34 1
> 38 4 35 1
> 39 5 36 1
> 40 10 37 1
> 41 12 38 1
> 42 13 39 1
> 43 14 40 1
> 44 15 41 1
> 45 17 42 1
> 46 20 43 1
> 47 20 44 1
> 48 20 45 1
> 49 27 46 1
>
> And the example.py code looks like
>
> from rdkit.Chem import AllChem
> from rdkit import Chem
>
> rdkit_mol = Chem.MolFromMol2File("example.mol2", sanitize=False,
> removeHs=False)
> mol = AllChem.RemoveHs(rdkit_mol)
>
> If running the example.py, it returns an error as below:
>
> ValueError: Sanitization error: Can't kekulize mol. Unkekulized atoms: 8
> 10 11 15 16 17 22 24 25
>
> It seems rdkit cannot understand the molecules when it try to remove the
> hydrogens, probably related to the format of the mol2 file I used here? I
> use openbabel to convert the mol2 file from an sdf file. So I wonder if
> there is a plan to parse the mol2 file like this or I need to further cook
> the mol2 file. I appreciate for any advices!
>
>
> Thanks,
>
> Shuai
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

[Rdkit-discuss] can't kekulize molecule

2017-08-15 Thread David Liu
Dear all,

I have troubles to kekulize molecule using rdkit, below is an example:

The example.mol2 file looks like below:

@MOLECULE
example
46 49 0 0 0
SMALL
GASTEIGER

@ATOM
1 C -4.5556 -0.2844 1.1718 C.3 1 LIG1 -0.0109
2 C -6.0291 -0.7271 1.2334 C.3 1 LIG1 0.0493
3 C -6.4413 -0.5958 -1.0493 C.3 1 LIG1 0.0493
4 C -5.1977 0.3130 -1.1927 C.3 1 LIG1 -0.0109
5 C 5.5992 -2.5640 -0.8780 C.ar 1 LIG1 -0.0253
6 O -6.3822 -1.4588 0.0764 O.3 1 LIG1 -0.3796
7 C 2.8943 1.6722 0.9911 C.ar 1 LIG1 0.2664
8 C 5.1745 -2.0407 0.3480 C.ar 1 LIG1 0.1371
9 C -1.6179 0.4017 0.1577 C.ar 1 LIG1 0.2173
10 C -4.0573 -0.1702 -0.2838 C.3 1 LIG1 0.0275
11 C 0.8767 -0.2307 1.1489 C.ar 1 LIG1 0.0370
12 C 2.1438 -0.5325 1.6439 C.ar 1 LIG1 -0.0306
13 C 6.1958 -1.7294 -1.8279 C.ar 1 LIG1 -0.0590
14 C 6.3717 -0.3702 -1.5525 C.ar 1 LIG1 -0.0605
15 C 5.9487 0.1564 -0.3282 C.ar 1 LIG1 -0.0452
16 C 0.6358 1.0320 0.5744 C.ar 1 LIG1 0.1483
17 C -0.1716 -1.1537 1.2042 C.ar 1 LIG1 0.0418
18 C 3.1618 0.4153 1.5592 C.ar 1 LIG1 0.0780
19 C 5.3424 -0.6749 0.6231 C.ar 1 LIG1 0.0480
20 C 1.3530 3.2786 -0.1013 C.3 1 LIG1 0.0167
21 F 4.6032 -2.8623 1.2640 F 1 LIG1 -0.2043
22 S 4.7969 0.0115 2.1898 S.3 1 LIG1 -0.0812
23 N -1.3906 -0.8211 0.7091 N.ar 1 LIG1 -0.
24 O 3.8206 2.5277 0.9363 O.2 1 LIG1 -0.2664
25 N 1.6412 1.9659 0.5033 N.ar 1 LIG1 -0.2949
26 N -0.6088 1.3106 0.0937 N.ar 1 LIG1 -0.1964
27 N -2.9091 0.7394 -0.3655 N.pl3 1 LIG1 -0.3104
28 H -3.9262 -1.0225 1.7144 H 1 LIG1 0.0305
29 H -4.4544 0.6942 1.6907 H 1 LIG1 0.0305
30 H -6.1785 -1.3738 2.1237 H 1 LIG1 0.0560
31 H -6.6965 0.1565 1.3647 H 1 LIG1 0.0560
32 H -7.3658 0.0220 -1.0063 H 1 LIG1 0.0560
33 H -6.5227 -1.2302 -1.9574 H 1 LIG1 0.0560
34 H -4.8575 0.3261 -2.2513 H 1 LIG1 0.0305
35 H -5.4753 1.3532 -0.9112 H 1 LIG1 0.0305
36 H 5.4676 -3.6168 -1.0922 H 1 LIG1 0.0646
37 H -3.7461 -1.1771 -0.6436 H 1 LIG1 0.0500
38 H 2.3428 -1.4998 2.0895 H 1 LIG1 0.0638
39 H 6.5237 -2.1362 -2.7758 H 1 LIG1 0.0618
40 H 6.8363 0.2748 -2.2870 H 1 LIG1 0.0618
41 H 6.0904 1.2094 -0.1219 H 1 LIG1 0.0630
42 H -0.0243 -2.1352 1.6372 H 1 LIG1 0.0838
43 H 2.2342 3.9528 -0.1073 H 1 LIG1 0.0457
44 H 0.5450 3.7853 0.4685 H 1 LIG1 0.0457
45 H 1.0258 3.1432 -1.1544 H 1 LIG1 0.0457
46 H -3.0166 1.6655 -0.8392 H 1 LIG1 0.1492
@BOND
1 1 2 1
2 1 10 1
3 2 6 1
4 3 4 1
5 3 6 1
6 4 10 1
7 5 8 ar
8 5 13 ar
9 7 18 ar
10 7 24 2
11 7 25 ar
12 8 19 ar
13 8 21 1
14 9 23 ar
15 9 26 ar
16 9 27 1
17 10 27 1
18 11 12 ar
19 11 16 ar
20 11 17 ar
21 12 18 ar
22 13 14 ar
23 14 15 ar
24 15 19 ar
25 16 25 ar
26 16 26 ar
27 17 23 ar
28 18 22 1
29 19 22 1
30 20 25 1
31 1 28 1
32 1 29 1
33 2 30 1
34 2 31 1
35 3 32 1
36 3 33 1
37 4 34 1
38 4 35 1
39 5 36 1
40 10 37 1
41 12 38 1
42 13 39 1
43 14 40 1
44 15 41 1
45 17 42 1
46 20 43 1
47 20 44 1
48 20 45 1
49 27 46 1

And the example.py code looks like

from rdkit.Chem import AllChem
from rdkit import Chem

rdkit_mol = Chem.MolFromMol2File("example.mol2", sanitize=False,
removeHs=False)
mol = AllChem.RemoveHs(rdkit_mol)

If running the example.py, it returns an error as below:

ValueError: Sanitization error: Can't kekulize mol. Unkekulized atoms: 8 10
11 15 16 17 22 24 25

It seems rdkit cannot understand the molecules when it try to remove the
hydrogens, probably related to the format of the mol2 file I used here? I
use openbabel to convert the mol2 file from an sdf file. So I wonder if
there is a plan to parse the mol2 file like this or I need to further cook
the mol2 file. I appreciate for any advices!


Thanks,

Shuai
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss