Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Andrew Dalke
On Nov 9, 2017, at 21:49, Brian Cole wrote: > Certainly, but thousands of lines of Python doesn't fit in an email in an > easily digestible way. :-) I'll restate things since I wasn't clear. While this step may be what you need for the way you structure things, there might be a better way to st

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Brian Cole
> > Somehow you got the code to generate a "9" for that ring closure, which is > not something that RDKit does naturally, so we are only seeing a step in > the larger part of your goal. > Certainly, but thousands of lines of Python doesn't fit in an email in an easily digestible way. :-) > Since

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Andrew Dalke
On Nov 9, 2017, at 16:09, Brian Cole wrote: > Here's an example of why this is useful at maintaining molecular > fragmentation inside your molecular representation: > > >>> from rdkit import Chem > >>> smiles = 'F9.[C@]91(C)CCO1' > >>> fluorine, core = smiles.split('.') > >>> fluorine > 'F9

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Chris Earnshaw
Trouble is, you're mixing chemical operations and lexical ones. It might be handy if this 'just worked' but in practice it's not going to produce valid SMILES without more work. I've written code in the past to do this kind of thing for virtual library building, using dummy atoms to mark link posi

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Brian Cole
Here's an example of why this is useful at maintaining molecular fragmentation inside your molecular representation: >>> from rdkit import Chem >>> smiles = 'F9.[C@]91(C)CCO1' >>> fluorine, core = smiles.split('.') >>> fluorine 'F9' >>> fragment = core.replace('9', '([*:9])') >>> fragment '[C@]([*

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-09 Thread Andrew Dalke
On Nov 9, 2017, at 08:13, Greg Landrum wrote: > As was discussed in the comments of > https://github.com/rdkit/rdkit/issues/786, I think it's pretty gross that the > second syntax is even legal. But that's a side point. To belabor that point. Neither Daylight SMILES nor OpenSMILES accept it, wh

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-08 Thread Chris Earnshaw
Hi Surely the problem is that some of these SMILES aren't really valid. From the Daylight theory manual: '*The bonds are numbered in any order, designating ring opening (or ring closure) bonds by a digit immediately following the atomic symbol at each ring closure'* (my emphasis). So the behavio

Re: [Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-08 Thread Greg Landrum
On Thu, Nov 9, 2017 at 6:32 AM, Brian Cole wrote: > Hi Cheminformaticians, > > This is an extreme subtlety in the interpretation of SMILES atom > stereochemistry and I think a bug in RDKit. Specifically, I think the > following SMILES should be the same molecule: > > >>> rdkit.__version__ > '2017

[Rdkit-discuss] RDKit appears to be parsing SMILES stereochemistry differently

2017-11-08 Thread Brian Cole
Hi Cheminformaticians, This is an extreme subtlety in the interpretation of SMILES atom stereochemistry and I think a bug in RDKit. Specifically, I think the following SMILES should be the same molecule: >>> rdkit.__version__ '2017.09.1' >>> Chem.CanonSmiles('F[C@@]1(C)CCO1') 'C[C@]1(F)CCO1' >>>