[Rdkit-discuss] Kekulizing thiazoles

2017-01-17 Thread Chris Arthur
Dear all


I have a molecule containing a thiazole ring which has been generated by a
reaction in Rdkit.

Sanitising the molecule gives kekulization error...

Chem.SanitizeMol(forwardProduct_)
Traceback (most recent call last):

  File "", line 1, in 
Chem.SanitizeMol(forwardProduct_)

ValueError: Sanitization error: Can't kekulize mol

I can generate a smiles string from it (I had thought of doing a smiles to
molecule conversion)

#Rdkit generated smiles that started us down this rabbit-hole
temp = Chem.MolToSmiles('CC(=O)c1sc(C2CCOCC2)nc1C')

But this fails

ArgumentError: Python argument types in
rdkit.Chem.rdmolfiles.MolToSmiles(str)
did not match C++ signature:
MolToSmiles(class RDKit::ROMol mol, bool isomericSmiles=False, bool
kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
allBondsExplicit=False, bool allHsExplicit=False)


So I thought I would try with simpler thiazoles

#ChemDraws smiles representation
temp = Chem.MolToSmiles('C1=CN=CS1')

#From wikipedias smile for thiazole
temp = Chem.MolToSmiles('n1ccsc1')

These however also fail.

 Can anyone suggest how I can proceed in order to sanitize such molecules

 Thanks

 Chris



-- 
Dr Christopher J. Arthur
School of Chemistry
University of Bristol
BRISTOL, BS8 1TS,  UK
E-mail:  chris.art...@bristol.ac.uk

Office: (+44 117) 331 7192
Mass Spectrometry Lab: (+44 117) 331 7358.
FAX: (+44 117) 927 7985

WWW URL: http://www.chm.bris.ac.uk/staff/carthur.htm
LinkedIn  Profile: https://www.linkedin.com/in/drchrisarthur
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Kekulizing thiazoles

2017-01-17 Thread Curt Fischer
To troubleshoot your sanitization problems, I think it would be helpful if
you could share your SMARTS reaction string and the rdkit version you are
using.

I just simulated the Hantzsch thiazole synthesis shown on Wikipedia, and
everythink worked normally for me.  Admittedly, my reaction definition is
overly tailored toward these two reactants, but I think it shows that rdkit
can *Sanitize()* thiazoles correctly.

# Hantzsch thiazole synthesis
thiourea = Chem.MolFromSmiles('CN(C)C(=S)N')
haloketone = Chem.MolFromSmiles('c1c1C(=O)C(C)Cl')
rxn_smarts =
'[NH2:1][C:2](=[S:3])[NH0:4].[C:5](=[O:6])[C:7][Cl:8]>>[N:4][c:2]1[s:3][c:5][c:7][n:1]1'
rxn = AllChem.ReactionFromSmarts(rxn_smarts)
product = rxn.RunReactants((thiourea, haloketone))[0][0]
Chem.SanitizeMol(product)
Chem.MolToSmiles(product)

Out[33]: 'Cc1nc(N(C)C)sc1-c1c1'


On Tue, Jan 17, 2017 at 9:29 AM, Curt Fischer 
wrote:

> I can't answer your root question, but if you want to go to SMILES and
> then back, I think you want *Chem.MolFromSmiles()*, not
> *Chem.MolToSmiles()*.
>
> Curt
>
> On Tue, Jan 17, 2017 at 8:52 AM, Chris Arthur 
> wrote:
>
>> Dear all
>>
>>
>> I have a molecule containing a thiazole ring which has been generated by
>> a reaction in Rdkit.
>>
>> Sanitising the molecule gives kekulization error...
>>
>> Chem.SanitizeMol(forwardProduct_)
>> Traceback (most recent call last):
>>
>>   File "", line 1, in 
>> Chem.SanitizeMol(forwardProduct_)
>>
>> ValueError: Sanitization error: Can't kekulize mol
>>
>> I can generate a smiles string from it (I had thought of doing a smiles
>> to molecule conversion)
>>
>> #Rdkit generated smiles that started us down this rabbit-hole
>> temp = Chem.MolToSmiles('CC(=O)c1sc(C2CCOCC2)nc1C')
>>
>> But this fails
>>
>> ArgumentError: Python argument types in
>> rdkit.Chem.rdmolfiles.MolToSmiles(str)
>> did not match C++ signature:
>> MolToSmiles(class RDKit::ROMol mol, bool isomericSmiles=False, bool
>> kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
>> allBondsExplicit=False, bool allHsExplicit=False)
>>
>>
>> So I thought I would try with simpler thiazoles
>>
>> #ChemDraws smiles representation
>> temp = Chem.MolToSmiles('C1=CN=CS1')
>>
>> #From wikipedias smile for thiazole
>> temp = Chem.MolToSmiles('n1ccsc1')
>>
>> These however also fail.
>>
>>  Can anyone suggest how I can proceed in order to sanitize such molecules
>>
>>  Thanks
>>
>>  Chris
>>
>>
>>
>> --
>> Dr Christopher J. Arthur
>> School of Chemistry
>> University of Bristol
>> BRISTOL, BS8 1TS,  UK
>> E-mail:  chris.art...@bristol.ac.uk
>>
>> Office: (+44 117) 331 7192 <+44%20117%20331%207192>
>> Mass Spectrometry Lab: (+44 117) 331 7358 <+44%20117%20331%207358>.
>> FAX: (+44 117) 927 7985 <+44%20117%20927%207985>
>>
>> WWW URL: http://www.chm.bris.ac.uk/staff/carthur.htm
>> LinkedIn  Profile: https://www.linkedin.com/in/drchrisarthur
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Guillaume GODIN
Thanks Brian,


PBF = 0 <=> 2D & PBF >0 <=> 3D.


I forget that point.


BR,

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Brian Kelley 
Envoyé : mardi 17 janvier 2017 14:06
À : Guillaume GODIN
Cc : cgearns...@gmail.com; Rdkit-discuss@lists.sourceforge.net; Greg Landrum
Objet : Re: [Rdkit-discuss] PMI API

In the inertial frame this is trivial, however, with the current RDKit can't 
you just use the plane of best fit here for the planar/3D?  For a linear 
molecule, you can use the PMI descriptors.

See PBF in RDKit

http://pubs.acs.org/doi/abs/10.1021/ci300293f

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:58 AM, Guillaume GODIN 
> wrote:

​Great! I also notice confusing usage of moment of Inertia in those descriptors.


For exemple in WHIM case, we need to know if the molecule is linear, planar or 
3D in order to compute the descriptors.


I did not find a easy way to determine this yet.


BR,​

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Brian Kelley >
Envoyé : mardi 17 janvier 2017 13:44
À : Chris Earnshaw
Cc : 
Rdkit-discuss@lists.sourceforge.net;
 Greg Landrum
Objet : Re: [Rdkit-discuss] PMI API

I think we agree here.  Here I was talking about the raw Moment (M1z) not the 
moment of interia (MI1), I should have made the disctinction more explicit.  
Moments are not necessarily Moments of inertia.  The terminology gets confusing.

After a brief discussion with Greg, the Moments.py does the correct calculation 
which indirectly verifies MOE and the newer RDKit implementation.

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:39 AM, Chris Earnshaw 
> wrote:
The dimensions along one of the axes of a planar molecule in its inertial frame 
will be zero, but the principal moments of inertia will all be non-zero. The 
moment of inertia about an axis can only be zero if all the atoms in the 
molecule are precisely aligned on that axis. That's only possible for linear 
molecules. There's no way to draw a straight line axis through all the atoms in 
a non-linear molecule, which would be a requirement for the corresponding 
moment of inertia to be zero.

Chris

On 17 January 2017 at 12:29, Brian Kelley 
> wrote:
Looks like I'm late to the game.  I don't know about the PMI descriptors 
per-se, but if a planar molecule is in it's inertial frame, one of the axes 
should be zero (whether it is x, y or z) which means that the one of the M1x, 
M1y or M1z should be zero.

We had some good experimentation with multipole expansion of moments 
(essentially based on the description of electrostatic multipoles) that might 
be nice to add to the PMI framework.

Greg, I'm assuming that the Moments.py we opensourced a while back is similarly 
broken?  I'm attaching it here for posterity but it does appear to match the 
moe PMI's.



On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
> wrote:
The new version looks good to me as far as I can test it. PMI and NPR are still 
fine, the radius of gyration is right (for an extremely artificial test system) 
and the asphericity index also seems right (despite my best efforts to confuse 
things further - sorry about that!). Also highlights even more confusion in the 
Todeschini article - the approximate asphericity values for prolate and oblate 
molecules are reversed.

The only (very trivial) thing I've spotted is the comment in the 
inertialShapeFactor function. 'planar or no coordinates' should be 'linear or 
no coordinates' to avoid confusion.

Chris

On 16 January 2017 at 09:30, Greg Landrum 
> wrote:


On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw 
> wrote:

Either way, it makes it rather hard to trust their derivations generally - 
especially as there appear to be other errors (e.g. the denominator in eq. 16 
should be the square root of the given sum of squares, according to their 
reference).

Indeed. Given the problems encountered, I went back and checked some additional 
references to find definitions of the descriptors. The results are in this PR, 
which I'd love feedback on if you have time to take a 

Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Brian Kelley
Looks like I'm late to the game.  I don't know about the PMI descriptors
per-se, but if a planar molecule is in it's inertial frame, one of the axes
should be zero (whether it is x, y or z) which means that the one of the
M1x, M1y or M1z should be zero.

We had some good experimentation with multipole expansion of moments
(essentially based on the description of electrostatic multipoles) that
might be nice to add to the PMI framework.

Greg, I'm assuming that the Moments.py we opensourced a while back is
similarly broken?  I'm attaching it here for posterity but it does appear
to match the moe PMI's.



On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
wrote:

> The new version looks good to me as far as I can test it. PMI and NPR are
> still fine, the radius of gyration is right (for an extremely artificial
> test system) and the asphericity index also seems right (despite my best
> efforts to confuse things further - sorry about that!). Also highlights
> even more confusion in the Todeschini article - the approximate asphericity
> values for prolate and oblate molecules are reversed.
>
> The only (very trivial) thing I've spotted is the comment in the
> inertialShapeFactor function. 'planar or no coordinates' should be 'linear
> or no coordinates' to avoid confusion.
>
> Chris
>
> On 16 January 2017 at 09:30, Greg Landrum  wrote:
>
>>
>>
>> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
>> ch...@cge-compchem.co.uk> wrote:
>>
>>>
>>> Either way, it makes it rather hard to trust their derivations generally
>>> - especially as there appear to be other errors (e.g. the denominator in
>>> eq. 16 should be the square root of the given sum of squares, according to
>>> their reference).
>>>
>>
>> Indeed. Given the problems encountered, I went back and checked some
>> additional references to find definitions of the descriptors. The results
>> are in this PR, which I'd love feedback on if you have time to take a look:
>> https://github.com/rdkit/rdkit/pull/1265
>>
>> I didn't manage to find any information about "inertial shape factor" and
>> don't have access to the references cited in the Todeschini paper, but I
>> think the others are now reasonably reliable.
>>
>> -greg
>>
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


Moments.py
Description: Binary data
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Brian Kelley
I think we agree here.  Here I was talking about the raw Moment (M1z) not
the moment of interia (MI1), I should have made the disctinction more
explicit.  Moments are not necessarily Moments of inertia.  The terminology
gets confusing.

After a brief discussion with Greg, the Moments.py does the correct
calculation which indirectly verifies MOE and the newer RDKit
implementation.

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:39 AM, Chris Earnshaw 
wrote:

> The dimensions along one of the axes of a planar molecule in its inertial
> frame will be zero, but the principal moments of inertia will all be
> non-zero. The moment of inertia about an axis can only be zero if all the
> atoms in the molecule are precisely aligned on that axis. That's only
> possible for linear molecules. There's no way to draw a straight line axis
> through all the atoms in a non-linear molecule, which would be a
> requirement for the corresponding moment of inertia to be zero.
>
> Chris
>
> On 17 January 2017 at 12:29, Brian Kelley  wrote:
>
>> Looks like I'm late to the game.  I don't know about the PMI descriptors
>> per-se, but if a planar molecule is in it's inertial frame, one of the axes
>> should be zero (whether it is x, y or z) which means that the one of the
>> M1x, M1y or M1z should be zero.
>>
>> We had some good experimentation with multipole expansion of moments
>> (essentially based on the description of electrostatic multipoles) that
>> might be nice to add to the PMI framework.
>>
>> Greg, I'm assuming that the Moments.py we opensourced a while back is
>> similarly broken?  I'm attaching it here for posterity but it does appear
>> to match the moe PMI's.
>>
>>
>>
>> On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
>> wrote:
>>
>>> The new version looks good to me as far as I can test it. PMI and NPR
>>> are still fine, the radius of gyration is right (for an extremely
>>> artificial test system) and the asphericity index also seems right (despite
>>> my best efforts to confuse things further - sorry about that!). Also
>>> highlights even more confusion in the Todeschini article - the approximate
>>> asphericity values for prolate and oblate molecules are reversed.
>>>
>>> The only (very trivial) thing I've spotted is the comment in the
>>> inertialShapeFactor function. 'planar or no coordinates' should be 'linear
>>> or no coordinates' to avoid confusion.
>>>
>>> Chris
>>>
>>> On 16 January 2017 at 09:30, Greg Landrum 
>>> wrote:
>>>


 On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
 ch...@cge-compchem.co.uk> wrote:

>
> Either way, it makes it rather hard to trust their derivations
> generally - especially as there appear to be other errors (e.g. the
> denominator in eq. 16 should be the square root of the given sum of
> squares, according to their reference).
>

 Indeed. Given the problems encountered, I went back and checked some
 additional references to find definitions of the descriptors. The results
 are in this PR, which I'd love feedback on if you have time to take a look:
 https://github.com/rdkit/rdkit/pull/1265

 I didn't manage to find any information about "inertial shape factor"
 and don't have access to the references cited in the Todeschini paper, but
 I think the others are now reasonably reliable.

 -greg



>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Brian Kelley
In the inertial frame this is trivial, however, with the current RDKit
can't you just use the plane of best fit here for the planar/3D?  For a
linear molecule, you can use the PMI descriptors.

See PBF in RDKit

http://pubs.acs.org/doi/abs/10.1021/ci300293f

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:58 AM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> ​Great! I also notice confusing usage of moment of Inertia in those
> descriptors.
>
>
> For exemple in WHIM case, we need to know if the molecule is linear,
> planar or 3D in order to compute the descriptors.
>
>
> I did not find a easy way to determine this yet.
>
>
> BR,​
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <+41%2022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <+41%2079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
> --
> *De :* Brian Kelley 
> *Envoyé :* mardi 17 janvier 2017 13:44
> *À :* Chris Earnshaw
> *Cc :* Rdkit-discuss@lists.sourceforge.net; Greg Landrum
> *Objet :* Re: [Rdkit-discuss] PMI API
>
> I think we agree here.  Here I was talking about the raw Moment (M1z) not
> the moment of interia (MI1), I should have made the disctinction more
> explicit.  Moments are not necessarily Moments of inertia.  The terminology
> gets confusing.
>
> After a brief discussion with Greg, the Moments.py does the correct
> calculation which indirectly verifies MOE and the newer RDKit
> implementation.
>
> Cheers,
>  Brian
>
> On Tue, Jan 17, 2017 at 7:39 AM, Chris Earnshaw 
> wrote:
>
>> The dimensions along one of the axes of a planar molecule in its inertial
>> frame will be zero, but the principal moments of inertia will all be
>> non-zero. The moment of inertia about an axis can only be zero if all the
>> atoms in the molecule are precisely aligned on that axis. That's only
>> possible for linear molecules. There's no way to draw a straight line axis
>> through all the atoms in a non-linear molecule, which would be a
>> requirement for the corresponding moment of inertia to be zero.
>>
>> Chris
>>
>> On 17 January 2017 at 12:29, Brian Kelley  wrote:
>>
>>> Looks like I'm late to the game.  I don't know about the PMI descriptors
>>> per-se, but if a planar molecule is in it's inertial frame, one of the axes
>>> should be zero (whether it is x, y or z) which means that the one of the
>>> M1x, M1y or M1z should be zero.
>>>
>>> We had some good experimentation with multipole expansion of moments
>>> (essentially based on the description of electrostatic multipoles) that
>>> might be nice to add to the PMI framework.
>>>
>>> Greg, I'm assuming that the Moments.py we opensourced a while back is
>>> similarly broken?  I'm attaching it here for posterity but it does appear
>>> to match the moe PMI's.
>>>
>>>
>>>
>>> On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
>>> wrote:
>>>
 The new version looks good to me as far as I can test it. PMI and NPR
 are still fine, the radius of gyration is right (for an extremely
 artificial test system) and the asphericity index also seems right (despite
 my best efforts to confuse things further - sorry about that!). Also
 highlights even more confusion in the Todeschini article - the approximate
 asphericity values for prolate and oblate molecules are reversed.

 The only (very trivial) thing I've spotted is the comment in the
 inertialShapeFactor function. 'planar or no coordinates' should be 'linear
 or no coordinates' to avoid confusion.

 Chris

 On 16 January 2017 at 09:30, Greg Landrum 
 wrote:

>
>
> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
> ch...@cge-compchem.co.uk> wrote:
>
>>
>> Either way, it makes it rather hard to trust their derivations
>> generally - especially as there appear to be other errors (e.g. the
>> denominator in eq. 16 should be the square root of the given sum of
>> squares, according to their reference).
>>
>
> Indeed. Given the problems encountered, I went back and checked some
> additional references to find definitions of the descriptors. The results
> are in this PR, which I'd love feedback on if you have time to take a 
> look:
> https://github.com/rdkit/rdkit/pull/1265
>
> I didn't manage to find any information about "inertial shape factor"
> and don't have access to the references cited in the Todeschini paper, but
> I think the others are now reasonably reliable.
>
> -greg
>
>
>

 
 --
 Check out the vibrant tech community on one of the world's most
 engaging tech sites, SlashDot.org! 

Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Chris Earnshaw
The dimensions along one of the axes of a planar molecule in its inertial
frame will be zero, but the principal moments of inertia will all be
non-zero. The moment of inertia about an axis can only be zero if all the
atoms in the molecule are precisely aligned on that axis. That's only
possible for linear molecules. There's no way to draw a straight line axis
through all the atoms in a non-linear molecule, which would be a
requirement for the corresponding moment of inertia to be zero.

Chris

On 17 January 2017 at 12:29, Brian Kelley  wrote:

> Looks like I'm late to the game.  I don't know about the PMI descriptors
> per-se, but if a planar molecule is in it's inertial frame, one of the axes
> should be zero (whether it is x, y or z) which means that the one of the
> M1x, M1y or M1z should be zero.
>
> We had some good experimentation with multipole expansion of moments
> (essentially based on the description of electrostatic multipoles) that
> might be nice to add to the PMI framework.
>
> Greg, I'm assuming that the Moments.py we opensourced a while back is
> similarly broken?  I'm attaching it here for posterity but it does appear
> to match the moe PMI's.
>
>
>
> On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
> wrote:
>
>> The new version looks good to me as far as I can test it. PMI and NPR are
>> still fine, the radius of gyration is right (for an extremely artificial
>> test system) and the asphericity index also seems right (despite my best
>> efforts to confuse things further - sorry about that!). Also highlights
>> even more confusion in the Todeschini article - the approximate asphericity
>> values for prolate and oblate molecules are reversed.
>>
>> The only (very trivial) thing I've spotted is the comment in the
>> inertialShapeFactor function. 'planar or no coordinates' should be 'linear
>> or no coordinates' to avoid confusion.
>>
>> Chris
>>
>> On 16 January 2017 at 09:30, Greg Landrum  wrote:
>>
>>>
>>>
>>> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
>>> ch...@cge-compchem.co.uk> wrote:
>>>

 Either way, it makes it rather hard to trust their derivations
 generally - especially as there appear to be other errors (e.g. the
 denominator in eq. 16 should be the square root of the given sum of
 squares, according to their reference).

>>>
>>> Indeed. Given the problems encountered, I went back and checked some
>>> additional references to find definitions of the descriptors. The results
>>> are in this PR, which I'd love feedback on if you have time to take a look:
>>> https://github.com/rdkit/rdkit/pull/1265
>>>
>>> I didn't manage to find any information about "inertial shape factor"
>>> and don't have access to the references cited in the Todeschini paper, but
>>> I think the others are now reasonably reliable.
>>>
>>> -greg
>>>
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Guillaume GODIN
​Great! I also notice confusing usage of moment of Inertia in those descriptors.


For exemple in WHIM case, we need to know if the molecule is linear, planar or 
3D in order to compute the descriptors.


I did not find a easy way to determine this yet.


BR,​

Dr. Guillaume GODIN
Principal Scientist
Chemoinformatic & Datamining
Innovation
CORPORATE R DIVISION
DIRECT LINE +41 (0)22 780 3645
MOBILE  +41 (0)79 536 1039
Firmenich SA
RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8


De : Brian Kelley 
Envoyé : mardi 17 janvier 2017 13:44
À : Chris Earnshaw
Cc : Rdkit-discuss@lists.sourceforge.net; Greg Landrum
Objet : Re: [Rdkit-discuss] PMI API

I think we agree here.  Here I was talking about the raw Moment (M1z) not the 
moment of interia (MI1), I should have made the disctinction more explicit.  
Moments are not necessarily Moments of inertia.  The terminology gets confusing.

After a brief discussion with Greg, the Moments.py does the correct calculation 
which indirectly verifies MOE and the newer RDKit implementation.

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:39 AM, Chris Earnshaw 
> wrote:
The dimensions along one of the axes of a planar molecule in its inertial frame 
will be zero, but the principal moments of inertia will all be non-zero. The 
moment of inertia about an axis can only be zero if all the atoms in the 
molecule are precisely aligned on that axis. That's only possible for linear 
molecules. There's no way to draw a straight line axis through all the atoms in 
a non-linear molecule, which would be a requirement for the corresponding 
moment of inertia to be zero.

Chris

On 17 January 2017 at 12:29, Brian Kelley 
> wrote:
Looks like I'm late to the game.  I don't know about the PMI descriptors 
per-se, but if a planar molecule is in it's inertial frame, one of the axes 
should be zero (whether it is x, y or z) which means that the one of the M1x, 
M1y or M1z should be zero.

We had some good experimentation with multipole expansion of moments 
(essentially based on the description of electrostatic multipoles) that might 
be nice to add to the PMI framework.

Greg, I'm assuming that the Moments.py we opensourced a while back is similarly 
broken?  I'm attaching it here for posterity but it does appear to match the 
moe PMI's.



On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
> wrote:
The new version looks good to me as far as I can test it. PMI and NPR are still 
fine, the radius of gyration is right (for an extremely artificial test system) 
and the asphericity index also seems right (despite my best efforts to confuse 
things further - sorry about that!). Also highlights even more confusion in the 
Todeschini article - the approximate asphericity values for prolate and oblate 
molecules are reversed.

The only (very trivial) thing I've spotted is the comment in the 
inertialShapeFactor function. 'planar or no coordinates' should be 'linear or 
no coordinates' to avoid confusion.

Chris

On 16 January 2017 at 09:30, Greg Landrum 
> wrote:


On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw 
> wrote:

Either way, it makes it rather hard to trust their derivations generally - 
especially as there appear to be other errors (e.g. the denominator in eq. 16 
should be the square root of the given sum of squares, according to their 
reference).

Indeed. Given the problems encountered, I went back and checked some additional 
references to find definitions of the descriptors. The results are in this PR, 
which I'd love feedback on if you have time to take a look:
https://github.com/rdkit/rdkit/pull/1265

I didn't manage to find any information about "inertial shape factor" and don't 
have access to the references cited in the Todeschini paper, but I think the 
others are now reasonably reliable.

-greg




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





**  
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  

Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Chris Earnshaw
The new version looks good to me as far as I can test it. PMI and NPR are
still fine, the radius of gyration is right (for an extremely artificial
test system) and the asphericity index also seems right (despite my best
efforts to confuse things further - sorry about that!). Also highlights
even more confusion in the Todeschini article - the approximate asphericity
values for prolate and oblate molecules are reversed.

The only (very trivial) thing I've spotted is the comment in the
inertialShapeFactor function. 'planar or no coordinates' should be 'linear
or no coordinates' to avoid confusion.

Chris

On 16 January 2017 at 09:30, Greg Landrum  wrote:

>
>
> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw  > wrote:
>
>>
>> Either way, it makes it rather hard to trust their derivations generally
>> - especially as there appear to be other errors (e.g. the denominator in
>> eq. 16 should be the square root of the given sum of squares, according to
>> their reference).
>>
>
> Indeed. Given the problems encountered, I went back and checked some
> additional references to find definitions of the descriptors. The results
> are in this PR, which I'd love feedback on if you have time to take a look:
> https://github.com/rdkit/rdkit/pull/1265
>
> I didn't manage to find any information about "inertial shape factor" and
> don't have access to the references cited in the Todeschini paper, but I
> think the others are now reasonably reliable.
>
> -greg
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Kekulizing thiazoles

2017-01-17 Thread Greg Landrum
I don't have anything to add to this other than to agree with Curt: I think
that the existing code should work fine with thiazoles.

@Curt: thanks for providing this detailed and thought-through answer!

-greg


On Tue, Jan 17, 2017 at 7:01 PM, Curt Fischer 
wrote:

> To troubleshoot your sanitization problems, I think it would be helpful if
> you could share your SMARTS reaction string and the rdkit version you are
> using.
>
> I just simulated the Hantzsch thiazole synthesis shown on Wikipedia, and
> everythink worked normally for me.  Admittedly, my reaction definition is
> overly tailored toward these two reactants, but I think it shows that rdkit
> can *Sanitize()* thiazoles correctly.
>
> # Hantzsch thiazole synthesis
> thiourea = Chem.MolFromSmiles('CN(C)C(=S)N')
> haloketone = Chem.MolFromSmiles('c1c1C(=O)C(C)Cl')
> rxn_smarts = '[NH2:1][C:2](=[S:3])[NH0:4].[C:5](=[O:6])[C:7][Cl:8]>>[N:4]
> [c:2]1[s:3][c:5][c:7][n:1]1'
> rxn = AllChem.ReactionFromSmarts(rxn_smarts)
> product = rxn.RunReactants((thiourea, haloketone))[0][0]
> Chem.SanitizeMol(product)
> Chem.MolToSmiles(product)
>
> Out[33]: 'Cc1nc(N(C)C)sc1-c1c1'
>
>
> On Tue, Jan 17, 2017 at 9:29 AM, Curt Fischer 
> wrote:
>
>> I can't answer your root question, but if you want to go to SMILES and
>> then back, I think you want *Chem.MolFromSmiles()*, not
>> *Chem.MolToSmiles()*.
>>
>> Curt
>>
>> On Tue, Jan 17, 2017 at 8:52 AM, Chris Arthur > > wrote:
>>
>>> Dear all
>>>
>>>
>>> I have a molecule containing a thiazole ring which has been generated by
>>> a reaction in Rdkit.
>>>
>>> Sanitising the molecule gives kekulization error...
>>>
>>> Chem.SanitizeMol(forwardProduct_)
>>> Traceback (most recent call last):
>>>
>>>   File "", line 1, in 
>>> Chem.SanitizeMol(forwardProduct_)
>>>
>>> ValueError: Sanitization error: Can't kekulize mol
>>>
>>> I can generate a smiles string from it (I had thought of doing a smiles
>>> to molecule conversion)
>>>
>>> #Rdkit generated smiles that started us down this rabbit-hole
>>> temp = Chem.MolToSmiles('CC(=O)c1sc(C2CCOCC2)nc1C')
>>>
>>> But this fails
>>>
>>> ArgumentError: Python argument types in
>>> rdkit.Chem.rdmolfiles.MolToSmiles(str)
>>> did not match C++ signature:
>>> MolToSmiles(class RDKit::ROMol mol, bool isomericSmiles=False, bool
>>> kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
>>> allBondsExplicit=False, bool allHsExplicit=False)
>>>
>>>
>>> So I thought I would try with simpler thiazoles
>>>
>>> #ChemDraws smiles representation
>>> temp = Chem.MolToSmiles('C1=CN=CS1')
>>>
>>> #From wikipedias smile for thiazole
>>> temp = Chem.MolToSmiles('n1ccsc1')
>>>
>>> These however also fail.
>>>
>>>  Can anyone suggest how I can proceed in order to sanitize such
>>> molecules
>>>
>>>  Thanks
>>>
>>>  Chris
>>>
>>>
>>>
>>> --
>>> Dr Christopher J. Arthur
>>> School of Chemistry
>>> University of Bristol
>>> BRISTOL, BS8 1TS,  UK
>>> E-mail:  chris.art...@bristol.ac.uk
>>>
>>> Office: (+44 117) 331 7192 <+44%20117%20331%207192>
>>> Mass Spectrometry Lab: (+44 117) 331 7358 <+44%20117%20331%207358>.
>>> FAX: (+44 117) 927 7985 <+44%20117%20927%207985>
>>>
>>> WWW URL: http://www.chm.bris.ac.uk/staff/carthur.htm
>>> LinkedIn  Profile: https://www.linkedin.com/in/drchrisarthur
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss