Actually, Greg, I think there is a way to win and I think you have in fact won.
I was asking myself what behavior one would actually require of a canonical SMILES containing wildcards. What I came up with is: if you replace the wildcards in the canonical SMILES with any atoms that could result in a legal structure, you should be able to recanonicalize the result to a legal SMILES. For the current example, [*]1cccc1 where the wildcard is a C, C1cccc1 canonicalizes to CC1=CC=C1, which meets the criterion. So, to me, this is in fact a win. If it had resulted in an error (because the starting SMILES contains aromatic atoms, but cannot be aromatic), I'd have regarded it as a loss. (By the above criterion, a kit that did regard C1cccc1 as illegal would have to canonicalize *1cccc1 as *C1=CC=C1 or something similar.) Anyway, I apologize for getting rather arcane here. Separately, I think I have found an example of two equivalent SMILES for a real molecule (no wildcards) that canonicalize differently in RDKit. I'll start a separate thread for this. -P. On Tue, Jun 16, 2015 at 12:36 AM, Greg Landrum <[email protected]> wrote: > On Mon, Jun 15, 2015 at 6:11 PM, Peter Shenkin <[email protected]> wrote: ... >> Pursuing another remark you made, RDKit canonicalizes C1=C*C=C1 as >> [*]1cccc1. This may also be unwarranted, because the wildcard could be >> another C, in which case the structure would not be aromatic. > > > That's correct.[1] There's not really a right answer when treating molecules > with query features as "real" molecules, this is just the convention that > the RDKit takes when canonicalizing structures containing dummy atoms. > > -greg > [1] Though, technically, if the * is a [CH-], then the ring would be > aromatic again. There's no way to win here. :-) ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

