Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder

2014-02-08 Thread John May
Hi Nick,

Yep that should work. Depending on what you’re trying to do though you might 
not need to extract the rings first… that is just run the SMARTS search. It can 
then give you the atoms of the ring.

Additionally 
- the new Cycles facade gives you much faster algorithms - 
Cycles.sssr(m.clone()).toRingSet()
- you may also consider if you really want to use the SSSR - 
http://www.jcheminf.com/content/6/1/3/abstract

J

On 8 Feb 2014, at 11:08, Nick Vandewiele nick.vandewi...@ugent.be wrote:

 Hi John!
 
 thanks for the research on this. This would have taken a lot of time for me 
 to find this out...
 
 so [C]1[C][C][C][C]1 is perceived as aromatic... this is in accordance with 
 the different behavior I see when I run the same code with six-rings instead 
 of five-rings. For 6-rings, there's no problem, presumably because it's not 
 perceived as aromatic.
 
 So what I do is first clone the original atomcontainer (to prevent it from 
 updating the implicit H-count), and then run the atom typing and adding 
 hydrogens on each of the IRings.
 
 IRingSet ringSet = new SSSRFinder(m.clone()).findSSSR();//find SSSR rings
 for(IAtomContainer ring : ringSet.atomContainers()){

 AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(ring);
   CDKHydrogenAdder.getInstance(blr).addImplicitHydrogens(ring);
boolean found = sqt.matches(ring);//true
 }
 
 regards,
 Nick
 
 
 From: John May [john...@ebi.ac.uk]
 Sent: Friday, February 07, 2014 6:48 PM
 To: Nick Vandewiele
 Cc: cdk-user@lists.sourceforge.net
 Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen 
 conversion and SSSRing finder
 
 Okay now I’ve actually tracked it down - the issue is to do with aromaticity 
 (kind of) and the SSSR providing a container for the ring atoms/bonds.
 
 With implicit hydrogens the substructure from the SSSRFinder looks like this…
 
 [CH]11
 
 Note that C1(O)1 is really [CH]1([OH])[CH2][CH2][CH2][CH2]1. In the CDK 
 removing atoms doesn’t update neighbour hydrogen counts hence the first 
 carbon keeps an implicit hydrogen count of 1.
 
 When all hydrogens are explicit we get
 
 [C]1[C][C][C][C]1
 
 For some reason the aromaticity algorithm finds it to be aromatic. I can fix 
 that but for now you can update the valences (i.e. AtomType/AddHydrogens) - 
 but consider this. The atoms in the IRing are the same as the molecule - so 
 adjusting the hydrogen count for the ring atoms would also affect the parent 
 molecule. You can even run it and you’ll get..
 
 [CH2]1([CH2](O[H])([CH2]([CH2]([CH2]1([H])[H])([H])[H])([H])[H])[H])([H])[H]
 
 It would be even worse when there are multiple rings. I’ve never liked IRing 
 anyway - much better to refer to rings by index without creating a new 
 container.
 
 Cheers,
 J
 
 On 7 Feb 2014, at 17:12, John May 
 john...@ebi.ac.ukmailto:john...@ebi.ac.uk wrote:
 
 Doh - of course. So the SMARTS has a quirk that ‘C1’ matches the 
 ‘CDKConstants.ISINRING’ flag.
 
 We can fix this without a patch - just add this before you match. The 
 SMARTSQueryTool should be doing it already - not sure why it isn’t though…. 
 (that’s the bug)
 
 SmartsMatchers.prepare(ring, true);
 
 On 7 Feb 2014, at 17:03, John May 
 john...@ebi.ac.ukmailto:john...@ebi.ac.uk wrote:
 
 No problem,
 
 master, but nothing should have changed…
 
 J
 
 On 7 Feb 2014, at 16:47, Nick Vandewiele 
 nick.vandewi...@ugent.bemailto:nick.vandewi...@ugent.be wrote:
 
 John,
 
 Thanks for the fast response!
 
 However: adding or removing dashes in the SMARTS string doesn’t change the 
 outcome when I try it.
 
 Also, using your proposed alternative, eg:
 Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr));
 for (IAtomContainer ring : ringSet.atomContainers()) {
System.out.println(pattern.matches(ring));
 }
 
 Does not change the outcome  (ie false) for me neither.
 Are you using the 1.5.4 or master branch?
 
 Regards,
 Nick
 
 From: John May [mailto:john...@ebi.ac.uk]
 Sent: Friday, February 07, 2014 5:28 PM
 To: Nick Vandewiele
 Cc: cdk-user@lists.sourceforge.netmailto:cdk-user@lists.sourceforge.net
 Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen 
 conversion and SSSRing finder
 
 Okay it’s the bond matching… C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn’t.
 
 Should be an easy fix.
 
 J
 
 On 7 Feb 2014, at 16:03, Nick Vandewiele 
 nick.vandewi...@ugent.bemailto:nick.vandewi...@ugent.be wrote:
 
 
 Hi,
 
 I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I 
 didn’t quite understand.
 When I search for a SMARTS pattern in one of the rings detected using the 
 SSSRFinder algorithm, the success of finding the pattern in the ring depends 
 on whether implicit hydrogens were converted to explicit ones, or not.
 If explicit hydrogens are present, the pattern is not found. If only implicit 
 hydrogens are present, the pattern IS found

Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder

2014-02-07 Thread John May
No problem,

master, but nothing should have changed… 

J

On 7 Feb 2014, at 16:47, Nick Vandewiele nick.vandewi...@ugent.be wrote:

 John,
  
 Thanks for the fast response!
  
 However: adding or removing dashes in the SMARTS string doesn’t change the 
 outcome when I try it.
  
 Also, using your proposed alternative, eg:
 Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr));  
  
 for (IAtomContainer ring : ringSet.atomContainers()) {
 System.out.println(pattern.matches(ring));  
 }
  
 Does not change the outcome  (ie false) for me neither.
 Are you using the 1.5.4 or master branch?
  
 Regards,
 Nick
  
 From: John May [mailto:john...@ebi.ac.uk] 
 Sent: Friday, February 07, 2014 5:28 PM
 To: Nick Vandewiele
 Cc: cdk-user@lists.sourceforge.net
 Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen 
 conversion and SSSRing finder
  
 Okay it’s the bond matching… C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn’t.
  
 Should be an easy fix.
  
 J
  
 On 7 Feb 2014, at 16:03, Nick Vandewiele nick.vandewi...@ugent.be wrote:
 
 
 Hi,
  
 I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I 
 didn’t quite understand.
 When I search for a SMARTS pattern in one of the rings detected using the 
 SSSRFinder algorithm, the success of finding the pattern in the ring depends 
 on whether implicit hydrogens were converted to explicit ones, or not.
 If explicit hydrogens are present, the pattern is not found. If only implicit 
 hydrogens are present, the pattern IS found.
  
 This code was used:
  
 Stringsmiles = C1C(O)CCC1;
 IChemObjectBuilder  blr= 
 SilentChemObjectBuilder.getInstance();
 SmilesParsersmipar = new SmilesParser(blr);
 IAtomContainerm = smipar.parseSmiles(smiles);
 String  smarts = C1-C-C-C-C1;
 SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr);

 AtomContainerManipulator.convertImplicitToExplicitHydrogens(m);
 IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings

 for(IAtomContainer ring : ringSet.atomContainers()){
   boolean found = sqt.matches(ring);//false (should be true)
 }
  
 Although the release notes of 1.5.4 are very informative, I couldn’t find an 
 answer explaining this behavior.
  
 So my question is two-fold:
 1)  how do I ensure that the pattern is found, even when explicit 
 hydrogens are used in the atomcontainer?
 2)  What is happening underneath the hood here? Is this behavior normal?
  
 Regards,
 Nick
  
 --
 Managing the Performance of Cloud-Based Applications
 Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
 Read the Whitepaper.
 http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___
 Cdk-user mailing list
 Cdk-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/cdk-user

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder

2014-02-07 Thread John May
Confirm - I’ll look into it tonight.

For now if you don’t need the SMARTS stereochemistry you can actually use the 
Pattern directly with the query container. This gives the correct answer:

Pattern pattern = Pattern.findSubstructure(SMARTSParser.parse(smarts, blr));

for (IAtomContainer ring : ringSet.atomContainers()) {
System.out.println(pattern.matches(ring));   
}

On 7 Feb 2014, at 16:03, Nick Vandewiele nick.vandewi...@ugent.be wrote:

 Hi,
  
 I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I 
 didn’t quite understand.
 When I search for a SMARTS pattern in one of the rings detected using the 
 SSSRFinder algorithm, the success of finding the pattern in the ring depends 
 on whether implicit hydrogens were converted to explicit ones, or not.
 If explicit hydrogens are present, the pattern is not found. If only implicit 
 hydrogens are present, the pattern IS found.
  
 This code was used:
  
 Stringsmiles = C1C(O)CCC1;
 IChemObjectBuilder  blr= 
 SilentChemObjectBuilder.getInstance();
 SmilesParsersmipar = new SmilesParser(blr);
 IAtomContainerm = smipar.parseSmiles(smiles);
 String  smarts = C1-C-C-C-C1;
 SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr);

 AtomContainerManipulator.convertImplicitToExplicitHydrogens(m);
 IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings

 for(IAtomContainer ring : ringSet.atomContainers()){
   boolean found = sqt.matches(ring);//false (should be true)
 }
  
 Although the release notes of 1.5.4 are very informative, I couldn’t find an 
 answer explaining this behavior.
  
 So my question is two-fold:
 1)  how do I ensure that the pattern is found, even when explicit 
 hydrogens are used in the atomcontainer?
 2)  What is happening underneath the hood here? Is this behavior normal?
  
 Regards,
 Nick
  
 --
 Managing the Performance of Cloud-Based Applications
 Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
 Read the Whitepaper.
 http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___
 Cdk-user mailing list
 Cdk-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/cdk-user

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen conversion and SSSRing finder

2014-02-07 Thread Nick Vandewiele
John,

Thanks for the fast response!

However: adding or removing dashes in the SMARTS string doesn't change the 
outcome when I try it.

Also, using your proposed alternative, eg:
Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr));
for (IAtomContainer ring : ringSet.atomContainers()) {
System.out.println(pattern.matches(ring));
}

Does not change the outcome  (ie false) for me neither.
Are you using the 1.5.4 or master branch?

Regards,
Nick

From: John May [mailto:john...@ebi.ac.uk]
Sent: Friday, February 07, 2014 5:28 PM
To: Nick Vandewiele
Cc: cdk-user@lists.sourceforge.net
Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen 
conversion and SSSRing finder

Okay it's the bond matching... C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn't.

Should be an easy fix.

J

On 7 Feb 2014, at 16:03, Nick Vandewiele 
nick.vandewi...@ugent.bemailto:nick.vandewi...@ugent.be wrote:


Hi,

I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I 
didn't quite understand.
When I search for a SMARTS pattern in one of the rings detected using the 
SSSRFinder algorithm, the success of finding the pattern in the ring depends on 
whether implicit hydrogens were converted to explicit ones, or not.
If explicit hydrogens are present, the pattern is not found. If only implicit 
hydrogens are present, the pattern IS found.

This code was used:

Stringsmiles = C1C(O)CCC1;
IChemObjectBuilder  blr= 
SilentChemObjectBuilder.getInstance();
SmilesParsersmipar = new SmilesParser(blr);
IAtomContainerm = smipar.parseSmiles(smiles);
String  smarts = C1-C-C-C-C1;
SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr);

AtomContainerManipulator.convertImplicitToExplicitHydrogens(m);
IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings

for(IAtomContainer ring : ringSet.atomContainers()){
  boolean found = sqt.matches(ring);//false (should be true)
}

Although the release notes of 1.5.4 are very informative, I couldn't find an 
answer explaining this behavior.

So my question is two-fold:
1)  how do I ensure that the pattern is found, even when explicit hydrogens 
are used in the atomcontainer?
2)  What is happening underneath the hood here? Is this behavior normal?

Regards,
Nick

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___
Cdk-user mailing list
Cdk-user@lists.sourceforge.netmailto:Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


Re: [Cdk-user] SMARTS Matching

2011-10-17 Thread Loren Lenzen
Hi Lochana,

The problem is with your SMARTS query.  The dollar sign means recursive 
matching and the X means the number of atoms. 
So you are asking for any atom that is connected to an aliphatic  nitrogen 
with 4 other atoms attached to the nitrogen or an atom that is connected 
to an aliphatic  nitrogen with 4 other atoms attached to the nitrogen.  I 
cannot think of any compounds that have an aromatic nitrogen connected to 
4 atoms.  What you probably want to check for is valence (which is a v 
instead of an X).  Also, there is no reason to use a recursive query here, 
nor is there a reason to have two separate queries for the aromatic and 
aliphatic (the atomic number can be used instead which matches both).

This is the query I would use:   [#7v4]

The CDK implementation of SMARTS is pretty good.  I don't believe 
chirality  or regiochemistry has been included yet, but otherwise

 http://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html has 
a very good tutorial.
 



From:
cdk-user-requ...@lists.sourceforge.net
To:
cdk-user@lists.sourceforge.net
Date:
10/15/2011 07:05 AM
Subject:
Cdk-user Digest, Vol 65, Issue 8



Send Cdk-user mailing list submissions to
 cdk-user@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/cdk-user
or, via email, send a message with subject or body 'help' to
 cdk-user-requ...@lists.sourceforge.net

You can reach the person managing the list at
 cdk-user-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than Re: Contents of Cdk-user digest...


Today's Topics:

   1. SMARTS Matching (lochana menikarachchi)
   2. Re: SMARTS Matching (Rajarshi Guha)


--

Message: 1
Date: Fri, 14 Oct 2011 08:10:50 -0700 (PDT)
From: lochana menikarachchi locha...@yahoo.com
Subject: [Cdk-user] SMARTS Matching
To: cdk-user@lists.sourceforge.net cdk-user@lists.sourceforge.net
Message-ID:
 1318605050.62208.yahoomail...@web114512.mail.gq1.yahoo.com
Content-Type: text/plain; charset=iso-8859-1

Hi All,

?
I was trying to use SMARTSQueryTool to match all aliphatic and aromatic N 
with 4 connections.

?[$([NX4+]),$([nX4+])]

?The molecules were read from PubChem SDF files. The query gives me only 
aliphatic, it also gives me things with 3 bonds say N connected to a =O 
and 2 others ?

Thanks.



__
Lochana C. Menikarachchi
Post Doctoral Research Fellow
Department of Pharmaceutical Sciences
School of Pharmacy
69, North Eagleville Rd, Unit 3092
University of Connecticut
Storrs, CT 06269-3092
Lab: 860-486-1591
Home: 860-450-1335
-- next part --
An HTML attachment was scrubbed...

--

Message: 2
Date: Fri, 14 Oct 2011 11:20:13 -0400
From: Rajarshi Guha rajarshi.g...@gmail.com
Subject: Re: [Cdk-user] SMARTS Matching
To: lochana menikarachchi locha...@yahoo.com
Cc: cdk-user@lists.sourceforge.net cdk-user@lists.sourceforge.net
Message-ID:
 CAC8sKVf9rqgGOTJ8O=kp242mrqnmgzbs8d0ctswmmdch5xn...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

please send examples of molecules that don't match the patterns

On Fri, Oct 14, 2011 at 11:10 AM, lochana menikarachchi
locha...@yahoo.com wrote:
 Hi All,

 I was trying to use SMARTSQueryTool to match all aliphatic and aromatic 
N
 with 4 connections.

 ?[$([NX4+]),$([nX4+])]

 ?The molecules were read from PubChem SDF files. The query gives me only
 aliphatic, it also gives me things with 3 bonds say N connected to a =O 
and
 2 others ?

 Thanks.


 __
 Lochana C. Menikarachchi
 Post Doctoral Research Fellow
 Department of Pharmaceutical Sciences
 School of Pharmacy
 69, North Eagleville Rd, Unit 3092
 University of Connecticut
 Storrs, CT 06269-3092
 Lab: 860-486-1591
 Home: 860-450-1335
 
--
 All the data continuously generated in your IT infrastructure contains a
 definitive record of customers, application performance, security
 threats, fraudulent activity and more. Splunk takes this data and makes
 sense of it. Business sense. IT sense. Common sense.
 http://p.sf.net/sfu/splunk-d2d-oct
 ___
 Cdk-user mailing list
 Cdk-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/cdk-user





-- 
Rajarshi Guha
NIH Chemical Genomics Center



--

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk

[Cdk-user] SMARTS Matching

2011-10-14 Thread lochana menikarachchi
Hi All,

 
I was trying to use SMARTSQueryTool to match all aliphatic and aromatic N with 
4 connections.

 [$([NX4+]),$([nX4+])]

 The molecules were read from PubChem SDF files. The query gives me only 
aliphatic, it also gives me things with 3 bonds say N connected to a =O and 2 
others ?

Thanks.



__
Lochana C. Menikarachchi
Post Doctoral Research Fellow
Department of Pharmaceutical Sciences
School of Pharmacy
69, North Eagleville Rd, Unit 3092
University of Connecticut
Storrs, CT 06269-3092
Lab: 860-486-1591
Home: 860-450-1335--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user


[Cdk-user] SMARTS matching and aromaticity detection

2009-07-21 Thread Loren Lenzen
Using CDK 1.2.2 SMARTSQueryTool and SmilesParser.

 the SMARTS query:[F,Cl,Br,I]C-C=C   doesn't hit the left two 
smiles, but does hit the right two smiles.  I have also found that the 
query hits the sd file related to the right two smiles using 
SDV2000Reader, aromaticity detection, and hydrogen adder (I am unsure 
whether the sd file was created from the smiles or vice versa).  Clearly, 
the query shouldn't hit any of these.  There seems to be a problem with 
aromaticity detection.  I have looked through the bug reports and noticed 
that Egon's recent patch for sp2 N's might work for the first issue 
(haven't tried out the patch yet).  Is the coumarin a special case bug?  I 
noticed DeduceBondSystemTool was mentioned in the comments to another bug 
report.  Should I be using this tool after parsing SMILES strings and mol 
files?  Would that make a difference?  Daylight's depictmatch service 
shows no hits for all four smiles.

Clc1ccc2nccn2c1  ClC1C=CC2=NC=CN2C=1
FC(F)(F)c1oc2c2c(=O)c1 -- FC(F)(F)C1Oc2c2C(=O)C=1

This message and any files transmitted with it are the property of
Sigma-Aldrich Corporation, are confidential, and are intended
solely for the use of the person or entity to whom this e-mail is
addressed.  If you are not one of the named recipient(s) or
otherwise have reason to believe that you have received this
message in error, please contact the sender and delete this message
immediately from your computer.  Any other use, retention,
dissemination, forwarding, printing, or copying of this e-mail is
strictly prohibited.--
___
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user