Yes, I actually exposed that function to Python  in Rdkit :)

Be aware that the canonical rank and the output order aren't the same thing.  
The rank is what is used during graph traversal, when making the smiles string, 
to choose what atom to go to next.  The output order is what atoms where output 
first, second, third in the output smiles string.  They are not necessarily the 
same.  

Both should, however, be unique for the input graph, but in either case 
explicit hydrogens should be added.

For reference:

order = Chem.CanonicalRankAtoms(m, includeChirality=True)

Is the function being discussed.

And as a bonus:

mol_ordered = Chem.RenumberAtoms(m, list(order))

Will make a copy in canonical atom order, but not canonical smiles output order.

----
Brian Kelley

> On Mar 10, 2016, at 7:36 AM, Maciek Wójcikowski <mac...@wojcikowski.pl> wrote:
> 
> Hi,
> 
> Few months back Greg has added CanonicalRankAtoms to rdkit.Chem after my 
> similar question.
> http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms
> 
> ----
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
> 
> 2016-03-10 13:18 GMT+01:00 Michal Krompiec <michal.kromp...@gmail.com>:
>> Thanks a lot, this is exactly what I wanted.
>> Best regards,
>> Michal
>> 
>>> On 10 March 2016 at 12:13, Brian Kelley <fustiga...@gmail.com> wrote:
>>> The canonicalizer doesn't treat hydrogens any differently than any other 
>>> atom, but they have to be in the graph.  If you are starting from smiles, 
>>> simply add explicit hydrogens, python example below:
>>> 
>>> >>> from rdkit import Chem
>>> >>> m = Chem.MolFromSmiles("CC")
>>> >>> mh = Chem.AddHs(m)
>>> >>> Chem.MolToSmiles(mh)
>>> '[H]C([H])([H])C([H])([H])[H]'
>>> >>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))
>>> # safer non eval version...
>>> >>> order = mh.GetPropsAsDict(includePrivate=True, 
>>>                               
>>> includeComputed=True)['_smilesAtomOutputOrder']
>>> >>> list(order)
>>> [2,0,3,4,1,5,6,7]
>>> >>> 
>>> 
>>> Not that the output order is from the context of the output smiles string, 
>>> i.e. order[0] is the index of the original atom index that was the outputs 
>>> first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx
>>> 
>>>> On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec 
>>>> <michal.kromp...@gmail.com> wrote:
>>>> Hello,
>>>> I need a "canonical" method for generating atom indices for a given 
>>>> molecule (with 3D coordinates, so the input is e.g. a mol file), for a 
>>>> molecular descriptor which should be invariant with respect to atom 
>>>> indexing. As I understand, canonical SMILES will give the same atom 
>>>> indices for non-hydrogen atoms, but is there a way in RDKit to generate 
>>>> unique indices for hydrogens as well?
>>>> Best regards,
>>>> Michal
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> 
>> 
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to