Re: [Rdkit-discuss] problem with substructure matching using SMARTS
Dear Greg, Thanks a lot! This solves my problem. Best wishes, Michal On 18 November 2013 10:27, Greg Landrum wrote: > Dear Michal, > > On Mon, Nov 18, 2013 at 10:55 AM, Michal Krompiec > wrote: >> >> Hello, >> Substructure matching with SMARTS behaves strangely sometimes - see code >> below. >> The pattern with [H] matches, but the pattern with [H,F] does not >> (both should match). >> >> from rdkit import Chem >> mol=Chem.MolFromSmiles('Clc2sccc2[H]') >> mol=Chem.AddHs(mol) >> p1=Chem.MolFromSmarts('c2sccc2[H]') >> p2=Chem.MolFromSmarts('c2sccc2[H,F]') >> print(mol.HasSubstructMatch(p1)) >> print(mol.HasSubstructMatch(p2)) > > > The problem is that an "H" in SMARTS normally doesn't mean what you think it > does. > > You can see what's going on using MolToSmarts: print(Chem.MolToSmarts(p1)) > c1:,-s:,-c:,-c:,-c:,-1-,:[#1] print(Chem.MolToSmarts(p2)) > c1:,-s:,-c:,-c:,-c:,-1-,:[H1,F] > > In the second case, the H has been converted into a query for an atom that > has exactly one H attached. The only time that the symbol "H" in a query is > interpreted as "an atom with atomic number 1" is when it shows up as "[H]", > as in your first example. > > The safest way to deal with H in SMARTS is to use [#1]: p1=Chem.MolFromSmarts('c2sccc2[#1]') p2=Chem.MolFromSmarts('c2sccc2[#1,F]') print(mol.HasSubstructMatch(p1)) > True print(mol.HasSubstructMatch(p2)) > True > > This is confusing, but it corresponds (at least I think it does) to the > "spec" from Daylight: > http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html > > -greg > > -- DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] problem with substructure matching using SMARTS
Dear Michal, On Mon, Nov 18, 2013 at 10:55 AM, Michal Krompiec wrote: > Hello, > Substructure matching with SMARTS behaves strangely sometimes - see code > below. > The pattern with [H] matches, but the pattern with [H,F] does not > (both should match). > > from rdkit import Chem > mol=Chem.MolFromSmiles('Clc2sccc2[H]') > mol=Chem.AddHs(mol) > p1=Chem.MolFromSmarts('c2sccc2[H]') > p2=Chem.MolFromSmarts('c2sccc2[H,F]') > print(mol.HasSubstructMatch(p1)) > print(mol.HasSubstructMatch(p2)) > The problem is that an "H" in SMARTS normally doesn't mean what you think it does. You can see what's going on using MolToSmarts: >>> print(Chem.MolToSmarts(p1)) c1:,-s:,-c:,-c:,-c:,-1-,:[#1] >>> print(Chem.MolToSmarts(p2)) c1:,-s:,-c:,-c:,-c:,-1-,:[H1,F] In the second case, the H has been converted into a query for an atom that has exactly one H attached. The only time that the symbol "H" in a query is interpreted as "an atom with atomic number 1" is when it shows up as "[H]", as in your first example. The safest way to deal with H in SMARTS is to use [#1]: >>> p1=Chem.MolFromSmarts('c2sccc2[#1]') >>> p2=Chem.MolFromSmarts('c2sccc2[#1,F]') >>> print(mol.HasSubstructMatch(p1)) True >>> print(mol.HasSubstructMatch(p2)) True This is confusing, but it corresponds (at least I think it does) to the "spec" from Daylight: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html -greg -- DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] problem with substructure matching using SMARTS
Hello, Substructure matching with SMARTS behaves strangely sometimes - see code below. The pattern with [H] matches, but the pattern with [H,F] does not (both should match). from rdkit import Chem mol=Chem.MolFromSmiles('Clc2sccc2[H]') mol=Chem.AddHs(mol) p1=Chem.MolFromSmarts('c2sccc2[H]') p2=Chem.MolFromSmarts('c2sccc2[H,F]') print(mol.HasSubstructMatch(p1)) print(mol.HasSubstructMatch(p2)) Best wishes, Michal -- DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss