Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Jason Biggs
Thanks to Curt, Markus, and John for helping me understand this. I knew that inchi had its limitations, but that didn't jump out at me here because there's no hydrogen migration between the different forms - not realizing these forms also qualify as tautomers. But So this is definitely a feature

Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Markus Sitzmann
On Thu, Sep 14, 2017 at 8:09 PM, Jason Biggs wrote: > Okay, all three of these smiles strings resolve to the same inchi, > > "O=[N+](C1=NC2=CC=CC=C2N=C1)[N-](=O)C1=NC2=CC=CC=C2N=C1" > "C1=CC=C2C(=C1)N=CC(=N2)N(=N(=O)C3=NC4=CC=CC=C4N=C3)=O" >

Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Markus Sitzmann
On Thu, Sep 14, 2017 at 8:09 PM, Jason Biggs wrote: > Okay, all three of these smiles strings resolve to the same inchi, > > "O=[N+](C1=NC2=CC=CC=C2N=C1)[N-](=O)C1=NC2=CC=CC=C2N=C1" > "C1=CC=C2C(=C1)N=CC(=N2)N(=N(=O)C3=NC4=CC=CC=C4N=C3)=O" >

Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Markus Sitzmann
On Thu, Sep 14, 2017 at 7:38 PM, John Mayfield wrote: > InChI is an identifier and not a representation, you should not read > InChIs... but we are beyond hope there so... > Wonderfully said - unfortunately one day they decided to make InChIs "readable" ... > The

Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Curt Fischer
I'm not 100% sure about this particular case, but I suspect this is a limitation of InChI. For example, the InChI representation of zwitterionic phenylalanine (negative COO-, positive NH3+) and "neutral" phenylalanine (neutral COOH, neutral NH2) is exactly the same. This is by design. See

Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Jason Biggs
Okay, all three of these smiles strings resolve to the same inchi, "O=[N+](C1=NC2=CC=CC=C2N=C1)[N-](=O)C1=NC2=CC=CC=C2N=C1" "C1=CC=C2C(=C1)N=CC(=N2)N(=N(=O)C3=NC4=CC=CC=C4N=C3)=O" "[O-][N+](c1cnc2c2n1)=[N+]([O-])c3cnc4c4n3" even though to me they seem like different structures due to the

Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread John Mayfield
InChI is an identifier and not a representation, you should not read InChIs... but we are beyond hope there so... The InChI string is correct and is the same if you roundtrip your preferred one with charge separated bonds and the 5 valent one. All toolkits will use the InChI library to