Re: [Rdkit-discuss] problem with substructure matching using SMARTS

2013-11-18 Thread Michal Krompiec
Dear Greg,
Thanks a lot! This solves my problem.
Best wishes,
Michal


On 18 November 2013 10:27, Greg Landrum  wrote:
> Dear Michal,
>
> On Mon, Nov 18, 2013 at 10:55 AM, Michal Krompiec
>  wrote:
>>
>> Hello,
>> Substructure matching with SMARTS behaves strangely sometimes - see code
>> below.
>> The pattern with [H] matches, but the pattern with [H,F] does not
>> (both should match).
>>
>> from rdkit import Chem
>> mol=Chem.MolFromSmiles('Clc2sccc2[H]')
>> mol=Chem.AddHs(mol)
>> p1=Chem.MolFromSmarts('c2sccc2[H]')
>> p2=Chem.MolFromSmarts('c2sccc2[H,F]')
>> print(mol.HasSubstructMatch(p1))
>> print(mol.HasSubstructMatch(p2))
>
>
> The problem is that an "H" in SMARTS normally doesn't mean what you think it
> does.
>
> You can see what's going on using MolToSmarts:
 print(Chem.MolToSmarts(p1))
> c1:,-s:,-c:,-c:,-c:,-1-,:[#1]
 print(Chem.MolToSmarts(p2))
> c1:,-s:,-c:,-c:,-c:,-1-,:[H1,F]
>
> In the second case, the H has been converted into a query for an atom that
> has exactly one H attached. The only time that the symbol "H" in a query is
> interpreted as "an atom with atomic number 1" is when it shows up as "[H]",
> as in your first example.
>
> The safest way to deal with H in SMARTS is to use [#1]:
 p1=Chem.MolFromSmarts('c2sccc2[#1]')
 p2=Chem.MolFromSmarts('c2sccc2[#1,F]')
 print(mol.HasSubstructMatch(p1))
> True
 print(mol.HasSubstructMatch(p2))
> True
>
> This is confusing, but it corresponds (at least I think it does) to the
> "spec" from Daylight:
> http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
>
> -greg
>
>

--
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] problem with substructure matching using SMARTS

2013-11-18 Thread Greg Landrum
Dear Michal,

On Mon, Nov 18, 2013 at 10:55 AM, Michal Krompiec  wrote:

> Hello,
> Substructure matching with SMARTS behaves strangely sometimes - see code
> below.
> The pattern with [H] matches, but the pattern with [H,F] does not
> (both should match).
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('Clc2sccc2[H]')
> mol=Chem.AddHs(mol)
> p1=Chem.MolFromSmarts('c2sccc2[H]')
> p2=Chem.MolFromSmarts('c2sccc2[H,F]')
> print(mol.HasSubstructMatch(p1))
> print(mol.HasSubstructMatch(p2))
>

The problem is that an "H" in SMARTS normally doesn't mean what you think
it does.

You can see what's going on using MolToSmarts:
>>> print(Chem.MolToSmarts(p1))
c1:,-s:,-c:,-c:,-c:,-1-,:[#1]
>>> print(Chem.MolToSmarts(p2))
c1:,-s:,-c:,-c:,-c:,-1-,:[H1,F]

In the second case, the H has been converted into a query for an atom that
has exactly one H attached. The only time that the symbol "H" in a query is
interpreted as "an atom with atomic number 1" is when it shows up as "[H]",
as in your first example.

The safest way to deal with H in SMARTS is to use [#1]:
>>> p1=Chem.MolFromSmarts('c2sccc2[#1]')
>>> p2=Chem.MolFromSmarts('c2sccc2[#1,F]')
>>> print(mol.HasSubstructMatch(p1))
True
>>> print(mol.HasSubstructMatch(p2))
True

This is confusing, but it corresponds (at least I think it does) to the
"spec" from Daylight:
http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

-greg
--
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] problem with substructure matching using SMARTS

2013-11-18 Thread Michal Krompiec
Hello,
Substructure matching with SMARTS behaves strangely sometimes - see code below.
The pattern with [H] matches, but the pattern with [H,F] does not
(both should match).

from rdkit import Chem
mol=Chem.MolFromSmiles('Clc2sccc2[H]')
mol=Chem.AddHs(mol)
p1=Chem.MolFromSmarts('c2sccc2[H]')
p2=Chem.MolFromSmarts('c2sccc2[H,F]')
print(mol.HasSubstructMatch(p1))
print(mol.HasSubstructMatch(p2))

Best wishes,
Michal

--
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss