Hi All,
I am trying to standardize the some SMIRKS patterns. Currently after writing out the smirks pattern, I am splitting into individual molecules (SMARTS), parsing them into RdKit as molecules, and writing them back out to SMARTS. After which, i join the SMARTS back together to obtain the 'standarized' SMIRKS pattern. In the examples below, i have highlighted (in red) the parts that differ between the two SMIRKS patterns, both before and after processing with RdKit. I am aware that RdKit has also changed the semi-colon (";") to the ampersand ("&"), but have not highlighted it as a change. Is there a way to standardize the SMARTS pattern in RdKit? I know you can canonicalize for SMILES by turning the flag on, but i'm not aware of such a feature for SMARTS patterns. Example SMARTS: SMIRKS_A: [c;H0;+0:6]-[c;H0;+0:5]1:[n;H0;+0:4]:[nH;+0:1]:[n;H0;+0:2]:[n;H0;+0:3]:1>>[N-;H0:1]=[N+;H0:2]=[N-;H0:3].[N;H0;+0:4]#[C;H0;+0:5]-[c;H0;+0:6] SMIRKS_B: [c;H0;+0:6]-[c;H0;+0:5]1:[n;H0;+0:4]:[n;H0;+0:1]:[n;H0;+0:2]:[nH;+0:3]:1>>[N-;H0:1]=[N+;H0:2]=[N-;H0:3].[N;H0;+0:4]#[C;H0;+0:5]-[c;H0;+0:6] SMIRKS_A post RdKit: [c&H0&+0:6]-[c&H0&+0:5]1:[n&H0&+0:4]:[n&H1&+0:1]:[n&H0&+0:2]:[n&H0&+0:3]:1>>[N&-&H0:1]=[N&+&H0:2]=[N&-&H0:3].[N&H0&+0:4]#[C&H0&+0:5]-[c&H0&+0:6] SMIRKS_B post RdKit: [c&H0&+0:6]-[c&H0&+0:5]1:[n&H0&+0:4]:[n&H0&+0:1]:[n&H0&+0:2]:[n&H1&+0:3]:1>>[N&-&H0:1]=[N&+&H0:2]=[N&-&H0:3].[N&H0&+0:4]#[C&H0&+0:5]-[c&H0&+0:6] Thanks for the help, -Amol
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss