Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Jason Biggs
Start with your benzene molecule

m = Chem.MolFromSmiles('c1c1')


make a pattern using Peter's example, with three aromatic atoms connected
by three aromatic bonds

patt = Chem.MolFromSmarts('a:a:a')


and it's a match:

m.HasSubstructMatch(patt)

>True


Kekulize your mol, and the pattern doesn't match

Chem.rdmolops.Kekulize(m)
m.HasSubstructMatch(patt)
>False


but if you change the smarts pattern to match aromatic atoms connected by
kekulized bonds, it matches

patt2 = Chem.MolFromSmarts('[a]=[a]-[a]')
m.HasSubstructMatch(patt2)
>True

Your original SMARTS query doesn't match, because C in a smarts string is
specifically an aliphatic carbon.  Change it to c and it will match.  It
would work, if you had removed the aromatic flags when kekulizing


m = Chem.MolFromSmiles('c1c1')
Chem.rdmolops.Kekulize(m, clearAromaticFlags = True)
patt = Chem.MolFromSmarts('[C]=[C]-[C]')
m.HasSubstructMatch(patt)
>True



So when you kekulize, without using the clearAromaticFlags option, then
aromatic atoms will still only match 'a', not 'A', but the bonds will only
match '=' or '-', but not ':'  (they will also match '@' or '~', but that's
beside the point here)

As Peter mentions, by default if you read in a kekulized SMILES string, the
mol you create will not be kekulized, but it sounds like you are
intentionally kekulizing before doing substructure matching.



Jason Biggs


On Fri, Sep 8, 2017 at 5:19 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in the SMILES of an aromatic molecule e.g., for
> benzene
>
> c1c1
>
> I then want to convert the molecule to a Kekule representation and
> then perform various SMARTS pattern recognition e.g.
>
> [C]=[C]-[C]
>
> I have tried various Kekule commands in RDkit, but I can not figure
> out how to (or if it is possible) to recognize a SMARTS pattern for
> a portion of a molecule which is aromatic, but is currently being
> stored as a Kekule structure.
>
> Also, is it possible to generate and store more than one Kekule
> form in RDkit?
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Peter S. Shenkin
Hi,

In SMARTS, 'a' matches an aromatic atom. So you would match your molecule
with the pattern 'aaa', or if you wanted to restrict yourself to carbons,
'ccc'.

This would match whether you created the molecule from a Kekulized or an
aromatic SMILES. Remember that it's the molecular recognition code, not the
form of the input SMILES, that determines whether a molecule is aromatic.

-P.

On Fri, Sep 8, 2017 at 6:19 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in the SMILES of an aromatic molecule e.g., for
> benzene
>
> c1c1
>
> I then want to convert the molecule to a Kekule representation and
> then perform various SMARTS pattern recognition e.g.
>
> [C]=[C]-[C]
>
> I have tried various Kekule commands in RDkit, but I can not figure
> out how to (or if it is possible) to recognize a SMARTS pattern for
> a portion of a molecule which is aromatic, but is currently being
> stored as a Kekule structure.
>
> Also, is it possible to generate and store more than one Kekule
> form in RDkit?
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread James T. Metz via Rdkit-discuss
Hello,


Suppose I read in the SMILES of an aromatic molecule e.g., for

benzene


c1c1



I then want to convert the molecule to a Kekule representation and

then perform various SMARTS pattern recognition e.g.


[C]=[C]-[C]



I have tried various Kekule commands in RDkit, but I can not figure

out how to (or if it is possible) to recognize a SMARTS pattern for
a portion of a molecule which is aromatic, but is currently being
stored as a Kekule structure.


Also, is it possible to generate and store more than one Kekule

form in RDkit?


Thank you.


Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using Chem.WrapLogs()

2017-09-08 Thread Noel O'Boyle
Thanks Maciek,

Both of those solutions works on Linux, which is fine for my purposes.
Neither works on Windows (let me know if you want me to file a bug).

Regards,
- Noel

On 8 September 2017 at 15:05, Maciek Wójcikowski 
wrote:

> Hi Noel,
>
> sio.seek(0) before assert or sio.getvalue() instead read().
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2017-09-08 15:51 GMT+02:00 Noel O'Boyle :
>
>> Hi all,
>>
>> I'd like to capture error messages during SMILES parsing, but am having
>> trouble getting this to work.
>>
>> The following code raises an AssertionError, for example. Is there
>> something here I'm missing? I'm using this from a Windows 7 conda
>> environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda
>> environment is also failing for me on Linux.
>>
>> import sys
>> from rdkit import Chem
>> Chem.WrapLogs()
>> from StringIO import StringIO
>>
>> old_stderr = sys.stderr
>> sio = sys.stderr = StringIO()
>>
>> mol = Chem.MolFromSmiles("c1c")
>> sys.stderr = old_stderr
>>
>> assert sio.read() != ""
>>
>> Regards,
>> - Noel
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using Chem.WrapLogs()

2017-09-08 Thread Maciek Wójcikowski
Hi Noel,

sio.seek(0) before assert or sio.getvalue() instead read().


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-09-08 15:51 GMT+02:00 Noel O'Boyle :

> Hi all,
>
> I'd like to capture error messages during SMILES parsing, but am having
> trouble getting this to work.
>
> The following code raises an AssertionError, for example. Is there
> something here I'm missing? I'm using this from a Windows 7 conda
> environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda
> environment is also failing for me on Linux.
>
> import sys
> from rdkit import Chem
> Chem.WrapLogs()
> from StringIO import StringIO
>
> old_stderr = sys.stderr
> sio = sys.stderr = StringIO()
>
> mol = Chem.MolFromSmiles("c1c")
> sys.stderr = old_stderr
>
> assert sio.read() != ""
>
> Regards,
> - Noel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using Chem.WrapLogs()

2017-09-08 Thread Andrew Dalke
On Sep 8, 2017, at 15:51, Noel O'Boyle  wrote:
> 
> Hi all,
> 
> I'd like to capture error messages during SMILES parsing, but am having 
> trouble getting this to work.
  ...
> assert sio.read() != ""

That should be a sio.getvalue(). The read() starts from the current file 
position, which is at the end of the previous output.

(Or if you really want a read(), do sio.seek(0) first.)

Cheers,

Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Debian Stretch Python3 Does Not Find RDKit

2017-09-08 Thread Greg Landrum
apologies for the slow reply; I'm still getting caught up from my vacation.

If you start the system python from the command line, can that find the
rdkit? You can test this as follows:
python -c 'from rdkit import Chem'

if that works, you know that the installation worked and that the problem
is with spyder (that's harder for me to help with, but if you google for
rdkit and spyder you might find some helpful answers).

If the above doesn't work, then we can start trying to diagnose what went
wrong with the install, Please start with:
which python
to make sure that you are in fact using the system python.

-greg


On Sat, Aug 19, 2017 at 4:05 PM, Stephen P. Molnar 
wrote:

> I have installed the Debian Stretch distribution Spyder3 and RDKit on my
> 64 bit Linux platform.
>
> There were no warning or error messages during the istallation process.
>
> However, when I attempted running a cookbook Python script (file
> attached), I got the following;
>
> Python 3.5.3 (default, Jan 19 2017, 14:11:04)
> Type "copyright", "credits" or "license" for more information.
>
> IPython 6.1.0 -- An enhanced Interactive Python.
>
> runfile('/home/comp/Apps/Python/untitled0.py',
> wdir='/home/comp/Apps/Python')
> Traceback (most recent call last):
>
>   File "", line 1, in 
> runfile('/home/comp/Apps/Python/untitled0.py',
> wdir='/home/comp/Apps/Python')
>
>   File 
> "/usr/local/lib/python3.5/dist-packages/spyder/utils/site/sitecustomize.py",
> line 688, in runfile
> execfile(filename, namespace)
>
>   File 
> "/usr/local/lib/python3.5/dist-packages/spyder/utils/site/sitecustomize.py",
> line 101, in execfile
> exec(compile(f.read(), filename, 'exec'), namespace)
>
>   File "/home/comp/Apps/Python/untitled0.py", line 11, in 
> from rdkit import Chem
>
> ImportError: No module named 'rdkit'
>
> I would greatly appreciate pointers towards a solution to this problem.
>
> Thanks in advance.
>
> --
> Stephen P. Molnar, Ph.D.Life is a fuzzy set
> www.molecular-modeling.net  Stochastic and multivariate
> (614)312-7528 (c)
> Skype: smolnar1
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Using Chem.WrapLogs()

2017-09-08 Thread Noel O'Boyle
Hi all,

I'd like to capture error messages during SMILES parsing, but am having
trouble getting this to work.

The following code raises an AssertionError, for example. Is there
something here I'm missing? I'm using this from a Windows 7 conda
environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda
environment is also failing for me on Linux.

import sys
from rdkit import Chem
Chem.WrapLogs()
from StringIO import StringIO

old_stderr = sys.stderr
sio = sys.stderr = StringIO()

mol = Chem.MolFromSmiles("c1c")
sys.stderr = old_stderr

assert sio.read() != ""

Regards,
- Noel
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetConformerRMS() vs GetBestRMS()

2017-09-08 Thread Greg Landrum
Hi Anikó,

Both functions do an alignment. The big difference here is coming because
GetBestRMS() looks at all 2D-identical alignments of the molecules to each
other while GetConformerRMS() only does the alignment once: using the atom
numbers.

Practically speaking what does that mean for your molecule?

Here's a 2D sketch without the Hs:
[image: Inline image 1]

By 2D symmetry atoms 8 and 9 are equivalent as are atoms 4 and 5.

So there are four possible 2D isomorphisms between those molecules :
8->8, 9->9, 4->4, 5->5  (all others the same)
8->9, 9->8, 4->4, 5->5  (all others the same)
8->8, 9->9, 4->5, 5->4  (all others the same)
8->9, 9->8, 4->5, 5->4  (all others the same)

GetBestRMS() does alignments for all of these and takes the one that
provides the lowest RMS value.
GetConformerRMS() only does the first alignment and uses that RMS.

In general you want to always use GetBestRMS() for symmetric molecules.

Does that help?
-greg
p.s. Adding the Hs leads to additional mappings which just makes the
overall problem worse.




On Fri, Sep 8, 2017 at 9:26 AM, Udvarhelyi, Aniko <
aniko.udvarhe...@novartis.com> wrote:

> Dear All,
>
>
>
> I would like to compute RMS values between conformers of the same molecule
> that are not aligned. Unfortunately, I can´t get along very well with the
> GetConformerRMS() function, it gives far too high RMS values even for
> conformers that are clearly (near-)identical as judged by visual inspection
> after alignment. I attach one example of 2 conformers of a molecule, that
> are near-identical.
>
> GetConformerRMS() returns an RMS value of 1.32 (with Hydrogens) and 0.70
> (disregarding Hydrogens).
>
> GetBestRMS() returns an RMS value of 0.03 (with Hydrogens) and 0.02
> (disregarding Hydrogens).
>
>
>
> Clearly, the GetBestRMS() result is the one I´d expect (I am interested
> in the all-atom RMSDs with Hydrogens). I guess GetConformerRMS() cannot
> align the two conformers properly hence the high RMS value. My question is
> why not? The atom ordering and all bonds are exactly the same in both
> conformers. Why do I need the GetBestRMS() alignment of all possible
> permutations of matching atom orders in both conformers to get the
> alignment correct? I would like to avoid using GetBestRMS()as it is far
> too slow for my purposes (processing many molecules with many conformers).
>
>
>
> Many thanks for any hints,
>
> Anikó
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GetConformerRMS() vs GetBestRMS()

2017-09-08 Thread Udvarhelyi, Aniko
Dear All,

I would like to compute RMS values between conformers of the same molecule that 
are not aligned. Unfortunately, I can´t get along very well with the 
GetConformerRMS() function, it gives far too high RMS values even for 
conformers that are clearly (near-)identical as judged by visual inspection 
after alignment. I attach one example of 2 conformers of a molecule, that are 
near-identical.
GetConformerRMS() returns an RMS value of 1.32 (with Hydrogens) and 0.70 
(disregarding Hydrogens).
GetBestRMS() returns an RMS value of 0.03 (with Hydrogens) and 0.02 
(disregarding Hydrogens).

Clearly, the GetBestRMS() result is the one I´d expect (I am interested in the 
all-atom RMSDs with Hydrogens). I guess GetConformerRMS() cannot align the two 
conformers properly hence the high RMS value. My question is why not? The atom 
ordering and all bonds are exactly the same in both conformers. Why do I need 
the GetBestRMS() alignment of all possible permutations of matching atom orders 
in both conformers to get the alignment correct? I would like to avoid using 
GetBestRMS()as it is far too slow for my purposes (processing many molecules 
with many conformers).

Many thanks for any hints,
Anikó


confs.sdf
Description: confs.sdf
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss