Actually, Greg, I think there is a way to win and I think you have in fact won.

I was asking myself what behavior one would actually require of a
canonical SMILES containing wildcards.

What I came up with is: if you replace the wildcards in the canonical
SMILES with any atoms that could result in a legal structure, you
should be able to recanonicalize the result to a legal SMILES.

For the current example, [*]1cccc1 where the wildcard is a C, C1cccc1
canonicalizes to CC1=CC=C1, which meets the criterion. So, to me, this
is in fact a win. If it had resulted in an error (because the starting
SMILES contains aromatic atoms, but cannot be aromatic), I'd have
regarded it as a loss.

(By the above criterion, a kit that did regard C1cccc1 as illegal
would have to canonicalize *1cccc1 as *C1=CC=C1 or something similar.)

Anyway, I apologize for getting rather arcane here. Separately, I
think I have found an example of two equivalent SMILES for a real
molecule (no wildcards) that canonicalize differently in RDKit. I'll
start a separate thread for this.

-P.


On Tue, Jun 16, 2015 at 12:36 AM, Greg Landrum <[email protected]> wrote:
> On Mon, Jun 15, 2015 at 6:11 PM, Peter Shenkin <[email protected]> wrote:
...
>> Pursuing another remark you made, RDKit canonicalizes C1=C*C=C1 as
>> [*]1cccc1. This may also be unwarranted, because the wildcard could be
>> another C, in which case the structure would not be aromatic.
>
>
> That's correct.[1] There's not really a right answer when treating molecules
> with query features as "real" molecules, this is just the convention that
> the RDKit takes when canonicalizing structures containing dummy atoms.
>
> -greg
> [1] Though, technically, if the * is a [CH-], then the ring would be
> aromatic again. There's no way to win here. :-)

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to