Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder
Hi Nick, Yep that should work. Depending on what you’re trying to do though you might not need to extract the rings first… that is just run the SMARTS search. It can then give you the atoms of the ring. Additionally - the new Cycles facade gives you much faster algorithms - Cycles.sssr(m.clone()).toRingSet() - you may also consider if you really want to use the SSSR - http://www.jcheminf.com/content/6/1/3/abstract J On 8 Feb 2014, at 11:08, Nick Vandewiele nick.vandewi...@ugent.be wrote: Hi John! thanks for the research on this. This would have taken a lot of time for me to find this out... so [C]1[C][C][C][C]1 is perceived as aromatic... this is in accordance with the different behavior I see when I run the same code with six-rings instead of five-rings. For 6-rings, there's no problem, presumably because it's not perceived as aromatic. So what I do is first clone the original atomcontainer (to prevent it from updating the implicit H-count), and then run the atom typing and adding hydrogens on each of the IRings. IRingSet ringSet = new SSSRFinder(m.clone()).findSSSR();//find SSSR rings for(IAtomContainer ring : ringSet.atomContainers()){ AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(ring); CDKHydrogenAdder.getInstance(blr).addImplicitHydrogens(ring); boolean found = sqt.matches(ring);//true } regards, Nick From: John May [john...@ebi.ac.uk] Sent: Friday, February 07, 2014 6:48 PM To: Nick Vandewiele Cc: cdk-user@lists.sourceforge.net Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder Okay now I’ve actually tracked it down - the issue is to do with aromaticity (kind of) and the SSSR providing a container for the ring atoms/bonds. With implicit hydrogens the substructure from the SSSRFinder looks like this… [CH]11 Note that C1(O)1 is really [CH]1([OH])[CH2][CH2][CH2][CH2]1. In the CDK removing atoms doesn’t update neighbour hydrogen counts hence the first carbon keeps an implicit hydrogen count of 1. When all hydrogens are explicit we get [C]1[C][C][C][C]1 For some reason the aromaticity algorithm finds it to be aromatic. I can fix that but for now you can update the valences (i.e. AtomType/AddHydrogens) - but consider this. The atoms in the IRing are the same as the molecule - so adjusting the hydrogen count for the ring atoms would also affect the parent molecule. You can even run it and you’ll get.. [CH2]1([CH2](O[H])([CH2]([CH2]([CH2]1([H])[H])([H])[H])([H])[H])[H])([H])[H] It would be even worse when there are multiple rings. I’ve never liked IRing anyway - much better to refer to rings by index without creating a new container. Cheers, J On 7 Feb 2014, at 17:12, John May john...@ebi.ac.ukmailto:john...@ebi.ac.uk wrote: Doh - of course. So the SMARTS has a quirk that ‘C1’ matches the ‘CDKConstants.ISINRING’ flag. We can fix this without a patch - just add this before you match. The SMARTSQueryTool should be doing it already - not sure why it isn’t though…. (that’s the bug) SmartsMatchers.prepare(ring, true); On 7 Feb 2014, at 17:03, John May john...@ebi.ac.ukmailto:john...@ebi.ac.uk wrote: No problem, master, but nothing should have changed… J On 7 Feb 2014, at 16:47, Nick Vandewiele nick.vandewi...@ugent.bemailto:nick.vandewi...@ugent.be wrote: John, Thanks for the fast response! However: adding or removing dashes in the SMARTS string doesn’t change the outcome when I try it. Also, using your proposed alternative, eg: Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr)); for (IAtomContainer ring : ringSet.atomContainers()) { System.out.println(pattern.matches(ring)); } Does not change the outcome (ie false) for me neither. Are you using the 1.5.4 or master branch? Regards, Nick From: John May [mailto:john...@ebi.ac.uk] Sent: Friday, February 07, 2014 5:28 PM To: Nick Vandewiele Cc: cdk-user@lists.sourceforge.netmailto:cdk-user@lists.sourceforge.net Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder Okay it’s the bond matching… C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn’t. Should be an easy fix. J On 7 Feb 2014, at 16:03, Nick Vandewiele nick.vandewi...@ugent.bemailto:nick.vandewi...@ugent.be wrote: Hi, I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I didn’t quite understand. When I search for a SMARTS pattern in one of the rings detected using the SSSRFinder algorithm, the success of finding the pattern in the ring depends on whether implicit hydrogens were converted to explicit ones, or not. If explicit hydrogens are present, the pattern is not found. If only implicit hydrogens are present, the pattern IS found
Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder
No problem, master, but nothing should have changed… J On 7 Feb 2014, at 16:47, Nick Vandewiele nick.vandewi...@ugent.be wrote: John, Thanks for the fast response! However: adding or removing dashes in the SMARTS string doesn’t change the outcome when I try it. Also, using your proposed alternative, eg: Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr)); for (IAtomContainer ring : ringSet.atomContainers()) { System.out.println(pattern.matches(ring)); } Does not change the outcome (ie false) for me neither. Are you using the 1.5.4 or master branch? Regards, Nick From: John May [mailto:john...@ebi.ac.uk] Sent: Friday, February 07, 2014 5:28 PM To: Nick Vandewiele Cc: cdk-user@lists.sourceforge.net Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder Okay it’s the bond matching… C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn’t. Should be an easy fix. J On 7 Feb 2014, at 16:03, Nick Vandewiele nick.vandewi...@ugent.be wrote: Hi, I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I didn’t quite understand. When I search for a SMARTS pattern in one of the rings detected using the SSSRFinder algorithm, the success of finding the pattern in the ring depends on whether implicit hydrogens were converted to explicit ones, or not. If explicit hydrogens are present, the pattern is not found. If only implicit hydrogens are present, the pattern IS found. This code was used: Stringsmiles = C1C(O)CCC1; IChemObjectBuilder blr= SilentChemObjectBuilder.getInstance(); SmilesParsersmipar = new SmilesParser(blr); IAtomContainerm = smipar.parseSmiles(smiles); String smarts = C1-C-C-C-C1; SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr); AtomContainerManipulator.convertImplicitToExplicitHydrogens(m); IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings for(IAtomContainer ring : ringSet.atomContainers()){ boolean found = sqt.matches(ring);//false (should be true) } Although the release notes of 1.5.4 are very informative, I couldn’t find an answer explaining this behavior. So my question is two-fold: 1) how do I ensure that the pattern is found, even when explicit hydrogens are used in the atomcontainer? 2) What is happening underneath the hood here? Is this behavior normal? Regards, Nick -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user
Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder
Confirm - I’ll look into it tonight. For now if you don’t need the SMARTS stereochemistry you can actually use the Pattern directly with the query container. This gives the correct answer: Pattern pattern = Pattern.findSubstructure(SMARTSParser.parse(smarts, blr)); for (IAtomContainer ring : ringSet.atomContainers()) { System.out.println(pattern.matches(ring)); } On 7 Feb 2014, at 16:03, Nick Vandewiele nick.vandewi...@ugent.be wrote: Hi, I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I didn’t quite understand. When I search for a SMARTS pattern in one of the rings detected using the SSSRFinder algorithm, the success of finding the pattern in the ring depends on whether implicit hydrogens were converted to explicit ones, or not. If explicit hydrogens are present, the pattern is not found. If only implicit hydrogens are present, the pattern IS found. This code was used: Stringsmiles = C1C(O)CCC1; IChemObjectBuilder blr= SilentChemObjectBuilder.getInstance(); SmilesParsersmipar = new SmilesParser(blr); IAtomContainerm = smipar.parseSmiles(smiles); String smarts = C1-C-C-C-C1; SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr); AtomContainerManipulator.convertImplicitToExplicitHydrogens(m); IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings for(IAtomContainer ring : ringSet.atomContainers()){ boolean found = sqt.matches(ring);//false (should be true) } Although the release notes of 1.5.4 are very informative, I couldn’t find an answer explaining this behavior. So my question is two-fold: 1) how do I ensure that the pattern is found, even when explicit hydrogens are used in the atomcontainer? 2) What is happening underneath the hood here? Is this behavior normal? Regards, Nick -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user
Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder
John, Thanks for the fast response! However: adding or removing dashes in the SMARTS string doesn't change the outcome when I try it. Also, using your proposed alternative, eg: Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr)); for (IAtomContainer ring : ringSet.atomContainers()) { System.out.println(pattern.matches(ring)); } Does not change the outcome (ie false) for me neither. Are you using the 1.5.4 or master branch? Regards, Nick From: John May [mailto:john...@ebi.ac.uk] Sent: Friday, February 07, 2014 5:28 PM To: Nick Vandewiele Cc: cdk-user@lists.sourceforge.net Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder Okay it's the bond matching... C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn't. Should be an easy fix. J On 7 Feb 2014, at 16:03, Nick Vandewiele nick.vandewi...@ugent.bemailto:nick.vandewi...@ugent.be wrote: Hi, I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I didn't quite understand. When I search for a SMARTS pattern in one of the rings detected using the SSSRFinder algorithm, the success of finding the pattern in the ring depends on whether implicit hydrogens were converted to explicit ones, or not. If explicit hydrogens are present, the pattern is not found. If only implicit hydrogens are present, the pattern IS found. This code was used: Stringsmiles = C1C(O)CCC1; IChemObjectBuilder blr= SilentChemObjectBuilder.getInstance(); SmilesParsersmipar = new SmilesParser(blr); IAtomContainerm = smipar.parseSmiles(smiles); String smarts = C1-C-C-C-C1; SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr); AtomContainerManipulator.convertImplicitToExplicitHydrogens(m); IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings for(IAtomContainer ring : ringSet.atomContainers()){ boolean found = sqt.matches(ring);//false (should be true) } Although the release notes of 1.5.4 are very informative, I couldn't find an answer explaining this behavior. So my question is two-fold: 1) how do I ensure that the pattern is found, even when explicit hydrogens are used in the atomcontainer? 2) What is happening underneath the hood here? Is this behavior normal? Regards, Nick -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Cdk-user mailing list Cdk-user@lists.sourceforge.netmailto:Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user
Re: [Cdk-user] SMARTS Matching
Hi Lochana, The problem is with your SMARTS query. The dollar sign means recursive matching and the X means the number of atoms. So you are asking for any atom that is connected to an aliphatic nitrogen with 4 other atoms attached to the nitrogen or an atom that is connected to an aliphatic nitrogen with 4 other atoms attached to the nitrogen. I cannot think of any compounds that have an aromatic nitrogen connected to 4 atoms. What you probably want to check for is valence (which is a v instead of an X). Also, there is no reason to use a recursive query here, nor is there a reason to have two separate queries for the aromatic and aliphatic (the atomic number can be used instead which matches both). This is the query I would use: [#7v4] The CDK implementation of SMARTS is pretty good. I don't believe chirality or regiochemistry has been included yet, but otherwise http://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html has a very good tutorial. From: cdk-user-requ...@lists.sourceforge.net To: cdk-user@lists.sourceforge.net Date: 10/15/2011 07:05 AM Subject: Cdk-user Digest, Vol 65, Issue 8 Send Cdk-user mailing list submissions to cdk-user@lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/cdk-user or, via email, send a message with subject or body 'help' to cdk-user-requ...@lists.sourceforge.net You can reach the person managing the list at cdk-user-ow...@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than Re: Contents of Cdk-user digest... Today's Topics: 1. SMARTS Matching (lochana menikarachchi) 2. Re: SMARTS Matching (Rajarshi Guha) -- Message: 1 Date: Fri, 14 Oct 2011 08:10:50 -0700 (PDT) From: lochana menikarachchi locha...@yahoo.com Subject: [Cdk-user] SMARTS Matching To: cdk-user@lists.sourceforge.net cdk-user@lists.sourceforge.net Message-ID: 1318605050.62208.yahoomail...@web114512.mail.gq1.yahoo.com Content-Type: text/plain; charset=iso-8859-1 Hi All, ? I was trying to use SMARTSQueryTool to match all aliphatic and aromatic N with 4 connections. ?[$([NX4+]),$([nX4+])] ?The molecules were read from PubChem SDF files. The query gives me only aliphatic, it also gives me things with 3 bonds say N connected to a =O and 2 others ? Thanks. __ Lochana C. Menikarachchi Post Doctoral Research Fellow Department of Pharmaceutical Sciences School of Pharmacy 69, North Eagleville Rd, Unit 3092 University of Connecticut Storrs, CT 06269-3092 Lab: 860-486-1591 Home: 860-450-1335 -- next part -- An HTML attachment was scrubbed... -- Message: 2 Date: Fri, 14 Oct 2011 11:20:13 -0400 From: Rajarshi Guha rajarshi.g...@gmail.com Subject: Re: [Cdk-user] SMARTS Matching To: lochana menikarachchi locha...@yahoo.com Cc: cdk-user@lists.sourceforge.net cdk-user@lists.sourceforge.net Message-ID: CAC8sKVf9rqgGOTJ8O=kp242mrqnmgzbs8d0ctswmmdch5xn...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 please send examples of molecules that don't match the patterns On Fri, Oct 14, 2011 at 11:10 AM, lochana menikarachchi locha...@yahoo.com wrote: Hi All, I was trying to use SMARTSQueryTool to match all aliphatic and aromatic N with 4 connections. ?[$([NX4+]),$([nX4+])] ?The molecules were read from PubChem SDF files. The query gives me only aliphatic, it also gives me things with 3 bonds say N connected to a =O and 2 others ? Thanks. __ Lochana C. Menikarachchi Post Doctoral Research Fellow Department of Pharmaceutical Sciences School of Pharmacy 69, North Eagleville Rd, Unit 3092 University of Connecticut Storrs, CT 06269-3092 Lab: 860-486-1591 Home: 860-450-1335 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user -- Rajarshi Guha NIH Chemical Genomics Center -- -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk
[Cdk-user] SMARTS Matching
Hi All, I was trying to use SMARTSQueryTool to match all aliphatic and aromatic N with 4 connections. [$([NX4+]),$([nX4+])] The molecules were read from PubChem SDF files. The query gives me only aliphatic, it also gives me things with 3 bonds say N connected to a =O and 2 others ? Thanks. __ Lochana C. Menikarachchi Post Doctoral Research Fellow Department of Pharmaceutical Sciences School of Pharmacy 69, North Eagleville Rd, Unit 3092 University of Connecticut Storrs, CT 06269-3092 Lab: 860-486-1591 Home: 860-450-1335-- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user
[Cdk-user] SMARTS matching and aromaticity detection
Using CDK 1.2.2 SMARTSQueryTool and SmilesParser. the SMARTS query:[F,Cl,Br,I]C-C=C doesn't hit the left two smiles, but does hit the right two smiles. I have also found that the query hits the sd file related to the right two smiles using SDV2000Reader, aromaticity detection, and hydrogen adder (I am unsure whether the sd file was created from the smiles or vice versa). Clearly, the query shouldn't hit any of these. There seems to be a problem with aromaticity detection. I have looked through the bug reports and noticed that Egon's recent patch for sp2 N's might work for the first issue (haven't tried out the patch yet). Is the coumarin a special case bug? I noticed DeduceBondSystemTool was mentioned in the comments to another bug report. Should I be using this tool after parsing SMILES strings and mol files? Would that make a difference? Daylight's depictmatch service shows no hits for all four smiles. Clc1ccc2nccn2c1 ClC1C=CC2=NC=CN2C=1 FC(F)(F)c1oc2c2c(=O)c1 -- FC(F)(F)C1Oc2c2C(=O)C=1 This message and any files transmitted with it are the property of Sigma-Aldrich Corporation, are confidential, and are intended solely for the use of the person or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please contact the sender and delete this message immediately from your computer. Any other use, retention, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited.-- ___ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user