Re: [Rdkit-discuss] canonical SMILES of a fragment

2017-08-02 Thread Pavel Polishchuk
Thanks Greg! I found an alternative solution which is also no so straightforward. I set an isotope label to aromatic atoms, generate isomeric SMILES and make regex replacement. But your suggestion to set remove hydrogens is important, since this can cause other ambiguity. import re

Re: [Rdkit-discuss] canonical SMILES of a fragment

2017-08-01 Thread Greg Landrum
Hi Pavel, It is, unfortunately, not that easy. The canonicalization algorithm does not use atomic aromaticity when determining atom ordering, so as far as it is concerned there is no difference between atoms 0 and 2 in either of your examples. What does get used is the number of hydrogens, so you

[Rdkit-discuss] canonical SMILES of a fragment

2017-08-01 Thread Pavel Polishchuk
Hi all, canonicalization of fragment SMILES does not work properly. Below there are two examples of identical fragments. The only difference is the order of atoms (indices). However, it seems that RDKit canonicalization does not take into account atom types. Does someone have an idea

Re: [Rdkit-discuss] canonical smiles for fragments with map numbers

2017-05-27 Thread Pavel Polishchuk
Thank you, Brian! Actually what I expected as output: S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2] S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3] S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3] and so on You gave me the right direction. I can store old-new maps in a dict and after relabeling and producing of canonical

Re: [Rdkit-discuss] canonical smiles for fragments with map numbers

2017-05-27 Thread Brian Kelley
Pavel, this isn't exactly trivial so I went ahead and made an example. The basics are that atomMaps are canonicalized, i.e. their value is used in the generation of smiles. To solve this problem: 1) backup the atom maps and remove them 2) canonicalize *without* atom maps but figure out the order

Re: [Rdkit-discuss] Canonical smiles for medium and large rings?

2011-01-04 Thread James Davidson
Hi Greg, On Sat, Dec 18, 2010 at 6:27 AM, Greg Landrum greg.land...@gmail.com wrote: I just checked in a set of changes that should get this (mostly) working correctly. Here's a demonstration with Geldanamycin: In [7]:

Re: [Rdkit-discuss] Canonical smiles for medium and large rings?

2010-12-28 Thread Greg Landrum
On Sat, Dec 18, 2010 at 6:27 AM, Greg Landrum greg.land...@gmail.com wrote:  For 'classic' aliphatic systems, double-bonds in 3-7-membered rings can only sensibly exist in the cis orientation, so 'ignoring' them would be ok.  However, for 8-membered and above, cis or trans are certainly

[Rdkit-discuss] Canonical smiles for medium and large rings?

2010-12-17 Thread James Davidson
Dear All, I have been investigating an issue that a colleague of mine identified. He was working with the RDKit Canon Smiles node in Knime, and found that for the natural product, Geldanamycin, the double-bond geometry information was being lost during canonicalisation. I repeated this result

Re: [Rdkit-discuss] Canonical SMILES

2009-02-17 Thread George Oakman
@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Canonical SMILES On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote: Yes, INnChI is unique across different packages. This is because there is one definitive source for the code and algorithm. This was a design goal of InChI. Or to twist TJ's words around

Re: [Rdkit-discuss] Canonical SMILES

2009-02-17 Thread Andrew Dalke
On Feb 17, 2009, at 9:18 AM, George Oakman wrote: Does someone know if I can assume that the canonical SMILES of RDKit are the same as the Open Babel ones? I wouldn't assume that without a lot of testing. My assumption is that canonical SMILES generation is so implementation sensitive that

Re: [Rdkit-discuss] Canonical SMILES

2009-02-17 Thread Noel O'Boyle
2009/2/17 Andrew Dalke da...@dalkescientific.com: On Feb 17, 2009, at 9:18 AM, George Oakman wrote: Does someone know if I can assume that the canonical SMILES of RDKit are the same as the Open Babel ones? You can assume they are not the same. No attempt has been made to make them consistent.

Re: [Rdkit-discuss] Canonical SMILES

2009-02-17 Thread Greg Landrum
On Fri, Feb 13, 2009 at 11:21 PM, Andrew Dalke da...@dalkescientific.com wrote: On Feb 13, 2009, at 9:14 PM, TJ O'Donnell wrote: Yes, INnChI is unique across different packages. This is because there is one definitive source for the code and algorithm. This was a design goal of InChI. Or

[Rdkit-discuss] Canonical SMILES

2009-02-13 Thread George Oakman
Hi all, I am very new to the RDKit and am in the process of running a few test to understand how things are working. One of the first example I have been playing with is the canonical SMILES for Aspirin. This is the piece of code I put together: RWMol *mol=new RWMol(); //Atoms for

Re: [Rdkit-discuss] Canonical SMILES

2009-02-13 Thread Andrew Dalke
On Feb 13, 2009, at 6:20 PM, George Oakman wrote: One of the first example I have been playing with is the canonical SMILES for Aspirin. .. This gave me the following result: CC(Oc1c1C(O)=O)=O But I was expecting CC(=O)Oc1c1C(=O)O) The canonical SMILES is canonical only