[Rdkit-discuss] SMARTS/SMIRKS Canonicalisation in RdKit

amol.thakkar Thu, 06 Dec 2018 09:35:04 -0800

Hi All,


I am trying to standardize the some SMIRKS patterns. Currently after writing 
out the smirks pattern, I am splitting into individual molecules (SMARTS), 
parsing them into RdKit as molecules, and writing them back out to SMARTS. 
After which, i join the SMARTS back together to obtain the 'standarized' SMIRKS 
pattern.


In the examples below, i have highlighted (in red) the parts that differ 
between the two SMIRKS patterns, both before and after processing with RdKit. I 
am aware that RdKit has also changed the semi-colon (";") to the ampersand 
("&"), but have not highlighted it as a change.


Is there a way to standardize the SMARTS pattern in RdKit?


I know you can canonicalize for SMILES by turning the flag on, but i'm not 
aware of such a feature for SMARTS patterns.


Example SMARTS:



SMIRKS_A:

[c;H0;+0:6]-[c;H0;+0:5]1:[n;H0;+0:4]:[nH;+0:1]:[n;H0;+0:2]:[n;H0;+0:3]:1>>[N-;H0:1]=[N+;H0:2]=[N-;H0:3].[N;H0;+0:4]#[C;H0;+0:5]-[c;H0;+0:6]


SMIRKS_B:

[c;H0;+0:6]-[c;H0;+0:5]1:[n;H0;+0:4]:[n;H0;+0:1]:[n;H0;+0:2]:[nH;+0:3]:1>>[N-;H0:1]=[N+;H0:2]=[N-;H0:3].[N;H0;+0:4]#[C;H0;+0:5]-[c;H0;+0:6]


SMIRKS_A post RdKit:

[c&H0&+0:6]-[c&H0&+0:5]1:[n&H0&+0:4]:[n&H1&+0:1]:[n&H0&+0:2]:[n&H0&+0:3]:1>>[N&-&H0:1]=[N&+&H0:2]=[N&-&H0:3].[N&H0&+0:4]#[C&H0&+0:5]-[c&H0&+0:6]


SMIRKS_B post RdKit:

[c&H0&+0:6]-[c&H0&+0:5]1:[n&H0&+0:4]:[n&H0&+0:1]:[n&H0&+0:2]:[n&H1&+0:3]:1>>[N&-&H0:1]=[N&+&H0:2]=[N&-&H0:3].[N&H0&+0:4]#[C&H0&+0:5]-[c&H0&+0:6]


Thanks for the help,


-Amol

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] SMARTS/SMIRKS Canonicalisation in RdKit

Reply via email to