Re: [Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected
Thanks, it works! I appreciate that Rdkit is so strict in representation of the the molecules and the substructures. I learned a lot in the mail list. Hongbin Yang From: Paolo ToscoDate: 2016-10-27 17:19To: 杨弘宾; rdkit-discussSubject: Re: [Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected Dear Hongbin, I am afraid The SMARTS you are using is not valid, as no SSSR can have less than 3 terms, or it wouldn't be a ring. If you change[a!r0] into, for instance, [a!r3], then you'll find the match you are looking for. Cheers, p. On 27/10/2016 09:36, 杨弘宾 wrote: Hi, I tryied using rdkit to match fragments with compounds only to find that rdkit performed not well in SMARTS. The following is the notebook I worked. from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import FragmentMatcher from rdkit.Chem.Draw import IPythonConsole In [49]: p = FragmentMatcher.FragmentMatcher() p.Init('[a!r0][NX3+](=[OX1])([O-])') In [50]: mol = Chem.MolFromSmiles('c1c1[N+](=O)[O-]') mol Out[50]: In [51]: p.HasMatch(mol) Out[51]: 0 In [52]: print Chem.MolFromSmarts('[a!r0][NX3+](=[OX1])([O-])') None However, openbabel worked well in matching the substrcutre. Even "or operator" was avaiable such as "[a!r0][$([NX3+](=[OX1])([O-])),$([NX3](=O)=O)]". >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> a=pybel.readstring('smi','c1c1[N+](=O)[O-]') >>> s.findall(a) [(6, 7, 8, 9)] It is a pity that rdkit can calculate the topological distance between two atoms while it cannot match the fragments... Is there any better API which I didn't find? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)
On 2016-10-26 23:39, Peter S. Shenkin wrote: > Hey, by the way, my agenda is trying to understand all this. (Using python syntax instead of ML) Recommended by TFM: from "http://www.w3.org/2000/svg; import * All svg names should work with or without package qualifier: point(), line(), etc., as well as svg.point(), svg.line(), ... Rdkit way: import "http://www.w3.org/2000/svg; as svg All svg names must be prefixed: svg.point(), svg.line(). Using unqualified point() should throw an error. (Unless there's another 'point' in the name resolution chain, yadda, yadda, yadda.) Unfortunately I find the fact that a lot of software out there doesn't get it right entirely unsurprising. :( Dima -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
It would seem that a major issue with RDKit's multiconformer file is the inability to associate structure-level and atom-level properties with conformations. t's not quite orthogonal to the question of how to read, say, a multiconformer SD file into RDKit's multiconformer format, because the conformers in said SD file could contain such properties, and information would be lost. -P. On Thu, Oct 27, 2016 at 6:20 AM, Thomas Evangelidiswrote: > Hello Greg, > > Is the canonical SMILES string always unique for every isomer and > tautomerization state of a molecule? If yes, then I have already written a > function to load multiple molecules and their conformers, which I can share > it here. > > best > Thomas > > PS: thanks to David for pointing this out. > > > > On 27 October 2016 at 05:20, Greg Landrum wrote: > >> Hi Thomas, >> >> You're right, reading multiple conformations out of an SDF does seem like >> one of those common operations. Unfortunately the RDKit does not currently >> support it in an easy way. >> >> A python implementation of this would be a good topic for Friday's UGM >> hackathon, we can see if anyone finds it interesting enough to work on. >> >> -greg >> >> >> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis >> wrote: >> >>> Hello everyone, >>> >>> I am a new user of RDkit and I was looking in the documentation for an >>> easy way to load multiple conformers from a structure file like .sdf. The >>> code must 1) distinguish between different protonation states of the same >>> molecule, 2) create a new Mol() object for each protonation state and load >>> into it the respective conformers. >>> >>> Apparently I can work out a solution for 1) >>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other >>> properties, but I was wondering if there is any more straight forward way >>> to do it. >>> For 2) I guess I must iterate over all molecules in the input file, >>> create new Mol() objects (one for each protonation state of each ligand) >>> and add conformers to these new Mol() objects. Again this sounds easily >>> programmable, but sounds like a very common operation, thus I was wondering >>> if it has been implemented in a function. >>> >>> thanks in advance >>> Thomas >>> >>> >>> -- >>> >>> == >>> >>> Thomas Evangelidis >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tev...@pharm.uoa.gr >>> >>> teva...@gmail.com >>> >>> >>> website: https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> >>> >>> -- >>> The Command Line: Reinvented for Modern Developers >>> Did the resurgence of CLI tooling catch you by surprise? >>> Reconnect with the command line and become more productive. >>> Learn the new .NET and ASP.NET CLI. Get your free copy! >>> http://sdm.link/telerik >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > > == > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > > teva...@gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > -- > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Categorising reactions using SMARTS
Looking into this further, I've decided to use the Python option again, as this seems to have more functions. I run the current example, where rxn is the original example, and qrxn is the 'query' for categorisation: In [2]: import rdkit In [3]: from rdkit import Chem In [4]: from rdkit.Chem import rdChemReactions In [5]: rxn = rdChemReactions.ReactionFromSmarts('Nc1nc(Cl)c2[nH]cnc2n1.OCC1CCC ...: CC1>>Nc1nc(OCC2C2)c2[nH]cnc2n1') In [6]: qrxn = rdChemReactions.ReactionFromSmarts('[cH1:1]1:[c:2](-[CH2:7]-[CH2 ...: :8]-[NH2:9]):[c:3]:[c:4]:[c:5]:[c:6]:1.[#6:11]-[CH1;R0:10]=[OD1]>>[c:1] ...: 12:[c:2](-[CH2:7]-[CH2:8]-[NH1:9]-[C:10]-2(-[#6:11])):[c:3]:[c:4]:[c:5] ...: :[c:6]:1') Pre-condition Violation getNumImplicitHs() called without preceding call to calcImplicitValence() Violation occurred on line 165 in file C:\Users\riccardo\Anaconda\conda-bld\work \Code\GraphMol\Atom.cpp Failed Expression: d_implicitValence > -1 --- RuntimeError Traceback (most recent call last) in () > 1 rdChemReactions.HasReactionSubstructMatch(rxn,qrxn) RuntimeError: Pre-condition Violation getNumImplicitHs() called without preceding call to calcImplicitValence( ) Violation occurred on line 165 in file Code\GraphMol\Atom.cpp Failed Expression: d_implicitValence > -1 RDKIT: 2016.03.1 BOOST: 1_59 Is the code working as designed (ultimately I want to feed lists of these together, but I'm trying one at a time for now)? On 27 October 2016 at 12:02, James Wallacewrote: > Hi, > I'm trying to replicate the Schneider categorisations with a local set of > reactions that I have stored in SMILES. I currently have the categorisation > filters as Reaction SMARTS, and I was hoping to do a standard substructure > comparison between the SMARTS and the SMILES, but can't seem to do that. > > I'm using the Java wrapped version, and I can see how to import a > ChemicalReaction as SMILES or SMARTS, I can't see how to compose such a > query. Can anyone offer me any help or pointers? > > Thanks in advance, > James > -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Categorising reactions using SMARTS
Hi, I'm trying to replicate the Schneider categorisations with a local set of reactions that I have stored in SMILES. I currently have the categorisation filters as Reaction SMARTS, and I was hoping to do a standard substructure comparison between the SMARTS and the SMILES, but can't seem to do that. I'm using the Java wrapped version, and I can see how to import a ChemicalReaction as SMILES or SMARTS, I can't see how to compose such a query. Can anyone offer me any help or pointers? Thanks in advance, James -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
Hello Greg, Is the canonical SMILES string always unique for every isomer and tautomerization state of a molecule? If yes, then I have already written a function to load multiple molecules and their conformers, which I can share it here. best Thomas PS: thanks to David for pointing this out. On 27 October 2016 at 05:20, Greg Landrumwrote: > Hi Thomas, > > You're right, reading multiple conformations out of an SDF does seem like > one of those common operations. Unfortunately the RDKit does not currently > support it in an easy way. > > A python implementation of this would be a good topic for Friday's UGM > hackathon, we can see if anyone finds it interesting enough to work on. > > -greg > > > On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis > wrote: > >> Hello everyone, >> >> I am a new user of RDkit and I was looking in the documentation for an >> easy way to load multiple conformers from a structure file like .sdf. The >> code must 1) distinguish between different protonation states of the same >> molecule, 2) create a new Mol() object for each protonation state and load >> into it the respective conformers. >> >> Apparently I can work out a solution for 1) using mol.GetProp('_Name'), >> mol.GetNumAtoms, mol.GetNumBonds >> and other properties, but I was wondering if there is any more straight >> forward way to do it. >> For 2) I guess I must iterate over all molecules in the input file, >> create new Mol() objects (one for each protonation state of each ligand) >> and add conformers to these new Mol() objects. Again this sounds easily >> programmable, but sounds like a very common operation, thus I was wondering >> if it has been implemented in a function. >> >> thanks in advance >> Thomas >> >> >> -- >> >> == >> >> Thomas Evangelidis >> >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tev...@pharm.uoa.gr >> >> teva...@gmail.com >> >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> >> -- >> The Command Line: Reinvented for Modern Developers >> Did the resurgence of CLI tooling catch you by surprise? >> Reconnect with the command line and become more productive. >> Learn the new .NET and ASP.NET CLI. Get your free copy! >> http://sdm.link/telerik >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- == Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected
Dear Hongbin, I am afraid The SMARTS you are using is not valid, as no SSSR can have less than 3 terms, or it wouldn't be a ring. If you change[a!r0] into, for instance, [a!r3], then you'll find the match you are looking for. Cheers, p. On 27/10/2016 09:36, 杨弘宾 wrote: Hi, I tryied using rdkit to match fragments with compounds only to find that rdkit performed not well in SMARTS. The following is the notebook I worked. from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import FragmentMatcher from rdkit.Chem.Draw import IPythonConsole In [49]: p = FragmentMatcher.FragmentMatcher() p.Init('[a!r0][NX3+](=[OX1])([O-])') In [50]: mol = Chem.MolFromSmiles('c1c1[N+](=O)[O-]') mol Out[50]: In [51]: p.HasMatch(mol) Out[51]: 0 In [52]: print Chem.MolFromSmarts('[a!r0][NX3+](=[OX1])([O-])') None However, openbabel worked well in matching the substrcutre. Even "or operator" was avaiable such as "[a!r0][$([NX3+](=[OX1])([O-])),$([NX3](=O)=O)]". >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> a=pybel.readstring('smi','c1c1[N+](=O)[O-]') >>> s.findall(a) [(6, 7, 8, 9)] It is a pity that rdkit can calculate the topological distance between two atoms while it cannot match the fragments... Is there any better API which I didn't find? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] The fragmentMatcher (SubstructMatcher) is not as good as expected
Hi,? ? I tryied using rdkit to match fragments with compounds only to find that rdkit performed not well in SMARTS. The following is the notebook I worked. from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import FragmentMatcher from rdkit.Chem.Draw import IPythonConsole In?[49]:p = FragmentMatcher.FragmentMatcher() p.Init('[a!r0][NX3+](=[OX1])([O-])') In?[50]:mol = Chem.MolFromSmiles('c1c1[N+](=O)[O-]') mol Out[50]:In?[51]:p.HasMatch(mol) Out[51]:0In?[52]:print Chem.MolFromSmarts('[a!r0][NX3+](=[OX1])([O-])') None However, openbabel worked well in matching the substrcutre. Even "or operator" was avaiable such as "[a!r0][$([NX3+](=[OX1])([O-])),$([NX3](=O)=O)]".? >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> s=pybel.Smarts('[a!r0][NX3+](=[OX1])([O-])') >>> a=pybel.readstring('smi','c1c1[N+](=O)[O-]') >>> s.findall(a) [(6, 7, 8, 9)] It is a pity that rdkit can calculate the topological distance between two atoms while it cannot match the fragments... Is there any better API which I didn't find? Hongbin Yang 杨弘宾 Research: Toxicophore and Chemoinformatics Pharmaceutical Science, School of Pharmacy East China University of Science and Technology? -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys format. It's easy to imagine something better, but this is at least already there and there could be other software that speaks it: https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_data/cmpd2.tpl I'd still like to do a decent JSON format and adding multi-confs to that would be logical On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrovewrote: > I've been wondering if, now that you can get decent conformations from > RDKit, it would be worth devising a multi-conformation file format to make > reading multi-conf molecules faster for vs purposes. In my experience, > pulling all the conformers out of an ascii file such as an sdf can become > the RDS for pharmacophore searchimg. Something to think about at the > hackathon maybe and certainly something that deserves a new email thread. > > Dave > > > On Thursday, 27 October 2016, Greg Landrum wrote: > >> Hi Thomas, >> >> You're right, reading multiple conformations out of an SDF does seem like >> one of those common operations. Unfortunately the RDKit does not currently >> support it in an easy way. >> >> A python implementation of this would be a good topic for Friday's UGM >> hackathon, we can see if anyone finds it interesting enough to work on. >> >> -greg >> >> >> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis >> wrote: >> >>> Hello everyone, >>> >>> I am a new user of RDkit and I was looking in the documentation for an >>> easy way to load multiple conformers from a structure file like .sdf. The >>> code must 1) distinguish between different protonation states of the same >>> molecule, 2) create a new Mol() object for each protonation state and load >>> into it the respective conformers. >>> >>> Apparently I can work out a solution for 1) >>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other >>> properties, but I was wondering if there is any more straight forward way >>> to do it. >>> For 2) I guess I must iterate over all molecules in the input file, >>> create new Mol() objects (one for each protonation state of each ligand) >>> and add conformers to these new Mol() objects. Again this sounds easily >>> programmable, but sounds like a very common operation, thus I was wondering >>> if it has been implemented in a function. >>> >>> thanks in advance >>> Thomas >>> >>> >>> -- >>> >>> == >>> >>> Thomas Evangelidis >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tev...@pharm.uoa.gr >>> >>> teva...@gmail.com >>> >>> >>> website: https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> >>> >>> -- >>> The Command Line: Reinvented for Modern Developers >>> Did the resurgence of CLI tooling catch you by surprise? >>> Reconnect with the command line and become more productive. >>> Learn the new .NET and ASP.NET CLI. Get your free copy! >>> http://sdm.link/telerik >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss