Hi RDKitters,

I have (yet another) question about the handling of SMARTS. I have a set of 
SMARTS (http://www.macinchem.org/reviews/pains/painsFilter.php) which I have 
been using to perform PAINS filters but I've just discovered some strange 
behaviour, I would expect a match to happen in the example below.

>>> p = Chem.MolFromSmarts('[#6]-[#6](=[#16])-[#1]')
>>> m = Chem.MolFromSmiles('CC=S')
>>> m.HasSubstructMatch(p)
False

This can be fixed using the alternative form of the SMARTS

>>> p2 = Chem.MolFromSmarts('[#6]-[#6H](=[#16])')
>>> m.HasSubstructMatch(p2)
True

Doing some research (which I can no longer find the link for) it seems that 
[#1] seems to be reserved for more 'interesting' cases of hydrogen, for example:

>>> m = Chem.MolFromSmiles('CC(=S)[H]')
>>> m.HasSubstructMatch(p)
False
>>> m = Chem.MolFromSmiles('CC(=S)[2H]')
>>> m.HasSubstructMatch(p)
True

Also this seems to be changing the examples which greg posted in 
http://sourceforge.net/p/rdkit/mailman/message/31650578/

>>> p1=Chem.MolFromSmarts('c2sccc2[#1]')
>>> mol=Chem.MolFromSmiles('Clc2sccc2[H]')
>>> mol.HasSubstructMatch(p1)
False

Firstly is this expected behaviour? Because it's different to what I would 
expect, and different to how Pipeline Pilot behaves with SMARTS matching. And 
secondly, does anyone know how to get the expected behaviour without rewriting 
all the SMARTS.?

Thanks in advance. Apologies for the long read.

Best,
Nick

Nicholas C. Firth | PhD Student | Cancer Therapeutics
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey 
| SM2 5NG
T 020 8722 4033 | E [email protected]<mailto:[email protected]> | 
W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter 
@ICRnews<https://twitter.com/ICRnews>
Facebook 
www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer

[cid:[email protected]]


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the 
message is received by anyone other than the addressee, please return the 
message to the sender by replying to it and then delete the message from your 
computer and network.
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to