Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Greg Landrum
On Fri, Mar 11, 2016 at 5:41 AM, Peter S. Shenkin  wrote:

> Seems that somewhere in the guts of RDKit there might well be code that
> divides atoms into equivalence classes.
>
> In the most common (tetravalent, tetrahedral) chiral situation, if the
> tetrahedral center's four connected atoms fell into 3 equivalence classes,
> the center would be prochiral. Then we'd have something like aC(b)(b)d,
> where a, b, and d represent the equivalence classes of the directly
> attached atoms.
>

Assuming I'm understanding you correctly, that's what the code below is
doing. When you tell CanonicalRankAtoms() not to breakTies, it just finds
the equivalence classes.


> In the past, for a core-hopping application, I've stored a core library as
> the SMILES of the stripped and H-substituted core. I've wished that I could
> use SMILES '@' notation to specify which of the two H's originally held the
> substituent (when only one did).
>
>
You could do that with the current RDKit by using either an isotopic label
or atom map number on the H's that replaced the side chain, *or* by using
dummies to mark the attachment points (this last is what
Chem.ReplaceSidechains does)

The atom map number is used in the canonicalization, as this example
demonstrates:

In [34]: m = Chem.MolFromSmiles('C[C@](F)(Cl)Br')

In [35]: m.GetAtomWithIdx(2).SetAtomicNum(1)

In [36]: m.GetAtomWithIdx(3).SetAtomicNum(1)

In [37]: Chem.MolToSmiles(m,True)
Out[37]: '[H][C@]([H])(C)Br'

In [38]: m.GetAtomWithIdx(2).SetProp('molAtomMapNumber','3')

In [39]: m.GetAtomWithIdx(3).SetProp('molAtomMapNumber','1')

In [40]: Chem.MolToSmiles(m,True)
Out[40]: 'C[C@@](Br)([H:1])[H:3]'

In [41]: m.GetAtomWithIdx(2).SetProp('molAtomMapNumber','1')

In [42]: m.GetAtomWithIdx(3).SetProp('molAtomMapNumber','3')

In [43]: Chem.MolToSmiles(m,True)
Out[43]: 'C[C@](Br)([H:1])[H:3]'


On Thu, Mar 10, 2016 at 10:53 PM, Greg Landrum 
> wrote:
>
>> This isn't an area I've thought much about, so this may be a bit naive.
>>
>> It seems like the interesting atom from the perspective of perception is
>> the carbon that the Hs are attached to, not the Hs themselves; it's the
>> carbon that will become a chiral center.
>>
>> If we neglect dependent stereochemistry for the moment, can't we just
>> find carbon atoms that have two hydrogen substituents and then look to see
>> if they (the carbons) have two differently ranked neighbors? The advantage
>> to this is that it saves the addHs step and will generally be a lot faster
>> to execute.
>>
>> Here's a quick code snippet, it lets you choose between using CIP ranks
>> to distinguish atoms or just using their canonical rankings.
>>
>> def findProchiral(mol,useCIPRanks=False):
>> if not useCIPRanks:
>> ranks = Chem.CanonicalRankAtoms(mol,breakTies=False)
>> else:
>> Chem.AssignStereochemistry(mol)
>> ranks = [x.GetProp('_CIPRank') for x in mol.GetAtoms()]
>> res = []
>> for atom in mol.GetAtoms():
>> # only consider atoms with two heavy neighbors and two Hs.
>> # could probably be further limited to just consider carbons
>> if atom.GetTotalDegree()!=4 or atom.GetTotalNumHs()!=2:
>> continue
>> hvyNbrRanks=[]
>> for nbr in atom.GetNeighbors():
>> if nbr.GetAtomicNum()>1:
>> hvyNbrRanks.append(ranks[nbr.GetIdx()])
>> if len(hvyNbrRanks)==2:
>> break
>> if hvyNbrRanks[0] != hvyNbrRanks[1]:
>> res.append(atom.GetIdx())
>> return res
>>
>>
>> Is that headed in the right direction?
>> -greg
>>
>>
>>
>> On Thu, Mar 10, 2016 at 5:30 PM, Peter S. Shenkin 
>> wrote:
>>
>>> Is the canonical rank of prochiral H's different or the same? (For
>>> example the rank of the H's on C-1 of ethyl chloride.)
>>>
>>> Thanks,
>>> -P.
>>>
>>
>>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Peter S. Shenkin
Seems that somewhere in the guts of RDKit there might well be code that
divides atoms into equivalence classes.

In the most common (tetravalent, tetrahedral) chiral situation, if the
tetrahedral center's four connected atoms fell into 3 equivalence classes,
the center would be prochiral. Then we'd have something like aC(b)(b)d,
where a, b, and d represent the equivalence classes of the directly
attached atoms.

In the past, for a core-hopping application, I've stored a core library as
the SMILES of the stripped and H-substituted core. I've wished that I could
use SMILES '@' notation to specify which of the two H's originally held the
substituent (when only one did).

-P.

On Thu, Mar 10, 2016 at 10:53 PM, Greg Landrum 
wrote:

> This isn't an area I've thought much about, so this may be a bit naive.
>
> It seems like the interesting atom from the perspective of perception is
> the carbon that the Hs are attached to, not the Hs themselves; it's the
> carbon that will become a chiral center.
>
> If we neglect dependent stereochemistry for the moment, can't we just find
> carbon atoms that have two hydrogen substituents and then look to see if
> they (the carbons) have two differently ranked neighbors? The advantage to
> this is that it saves the addHs step and will generally be a lot faster to
> execute.
>
> Here's a quick code snippet, it lets you choose between using CIP ranks to
> distinguish atoms or just using their canonical rankings.
>
> def findProchiral(mol,useCIPRanks=False):
> if not useCIPRanks:
> ranks = Chem.CanonicalRankAtoms(mol,breakTies=False)
> else:
> Chem.AssignStereochemistry(mol)
> ranks = [x.GetProp('_CIPRank') for x in mol.GetAtoms()]
> res = []
> for atom in mol.GetAtoms():
> # only consider atoms with two heavy neighbors and two Hs.
> # could probably be further limited to just consider carbons
> if atom.GetTotalDegree()!=4 or atom.GetTotalNumHs()!=2:
> continue
> hvyNbrRanks=[]
> for nbr in atom.GetNeighbors():
> if nbr.GetAtomicNum()>1:
> hvyNbrRanks.append(ranks[nbr.GetIdx()])
> if len(hvyNbrRanks)==2:
> break
> if hvyNbrRanks[0] != hvyNbrRanks[1]:
> res.append(atom.GetIdx())
> return res
>
>
> Is that headed in the right direction?
> -greg
>
>
>
> On Thu, Mar 10, 2016 at 5:30 PM, Peter S. Shenkin 
> wrote:
>
>> Is the canonical rank of prochiral H's different or the same? (For
>> example the rank of the H's on C-1 of ethyl chloride.)
>>
>> Thanks,
>> -P.
>>
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Stereochemistry question

2016-03-10 Thread Greg Landrum
Dear all,

Here's a question for the chemists in the group: do we need to be concerned
about representing the stereochemistry of the P=C bond in substructures
like O=P(/O)=C/C under normal circumstances?

Here's a pubchem compound example that has the double bond crossed
(possibly leading one to believe that it could have stereochemistry):
https://pubchem.ncbi.nlm.nih.gov/compound/56981965
Here's the corresponding substance record (which is how PubChem received
the structure):
https://pubchem.ncbi.nlm.nih.gov/substance/135741697

Another example, this time without the crossed bond in the compound record:
https://pubchem.ncbi.nlm.nih.gov/compound/87396055

Best,
-greg
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Greg Landrum
This isn't an area I've thought much about, so this may be a bit naive.

It seems like the interesting atom from the perspective of perception is
the carbon that the Hs are attached to, not the Hs themselves; it's the
carbon that will become a chiral center.

If we neglect dependent stereochemistry for the moment, can't we just find
carbon atoms that have two hydrogen substituents and then look to see if
they (the carbons) have two differently ranked neighbors? The advantage to
this is that it saves the addHs step and will generally be a lot faster to
execute.

Here's a quick code snippet, it lets you choose between using CIP ranks to
distinguish atoms or just using their canonical rankings.

def findProchiral(mol,useCIPRanks=False):
if not useCIPRanks:
ranks = Chem.CanonicalRankAtoms(mol,breakTies=False)
else:
Chem.AssignStereochemistry(mol)
ranks = [x.GetProp('_CIPRank') for x in mol.GetAtoms()]
res = []
for atom in mol.GetAtoms():
# only consider atoms with two heavy neighbors and two Hs.
# could probably be further limited to just consider carbons
if atom.GetTotalDegree()!=4 or atom.GetTotalNumHs()!=2:
continue
hvyNbrRanks=[]
for nbr in atom.GetNeighbors():
if nbr.GetAtomicNum()>1:
hvyNbrRanks.append(ranks[nbr.GetIdx()])
if len(hvyNbrRanks)==2:
break
if hvyNbrRanks[0] != hvyNbrRanks[1]:
res.append(atom.GetIdx())
return res


Is that headed in the right direction?
-greg



On Thu, Mar 10, 2016 at 5:30 PM, Peter S. Shenkin  wrote:

> Is the canonical rank of prochiral H's different or the same? (For example
> the rank of the H's on C-1 of ethyl chloride.)
>
> Thanks,
> -P.
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Brian Kelley
No, prochiral atoms have the same rank.  Your question got me thinking to
how we could detect prochiral atoms, here is the stupidest/simplest
solution I could come up with, it changes isotopes on every atom until a
new chiral center is added, this atom is considered prochiral:

def numUnspecifiedStereoAtoms(mol):
"""Return the number of unspecified stereo atoms in a molecule"""
return len([atom for atom in mol.GetAtoms() if
("_ChiralityPossible" in atom.GetPropNames() and
 atom.GetChiralTag() ==
Chem.rdchem.ChiralType.CHI_UNSPECIFIED)])

def findProchiral(m):
"""Return indices of prochiral atoms, to find prochiral hydrogens,
hydrogens must appear in the graph, see Chem.AddHs"""
indices = [ atom.GetIdx() for atom in m.GetAtoms() ]
tags = [atom.GetChiralTag() for atom in m.GetAtoms()]
num_unspecified = numUnspecifiedStereoAtoms(m)
prochiral = []
for index in indices:
m2 = Chem.Mol(m)
m2.GetAtomWithIdx(index).SetIsotope(2)
m3 = Chem.MolFromSmiles(Chem.MolToSmiles(m2, isomericSmiles=True))
if numUnspecifiedStereoAtoms(m3) != num_unspecified:
prochiral.append(index)


return prochiral

print findProchiral(Chem.AddHs(Chem.MolFromSmiles("C1C(C(N)=O)=CNC=C1")))

On Thu, Mar 10, 2016 at 11:30 AM, Peter S. Shenkin 
wrote:

> Is the canonical rank of prochiral H's different or the same? (For example
> the rank of the H's on C-1 of ethyl chloride.)
>
> Thanks,
> -P.
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Peter S. Shenkin
Is the canonical rank of prochiral H's different or the same? (For example
the rank of the H's on C-1 of ethyl chloride.)

Thanks,
-P.
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Merge two fragments at connection point

2016-03-10 Thread Steven Combs
Thanks!

I ended up doing this:

1) convert Rosetta restypes to RDKit ROMols
2) use combineMols(), to get a merged ROMol
3) static cast to RWMol and add bond between connecting atoms
4) Convert back to Rosetta
5) Use Rosetta transformations on the atoms that needed to be moved (I
didnt find the Transform methods in RDKit in time!)

Steven Combs


On Thu, Mar 10, 2016 at 12:23 AM, Greg Landrum 
wrote:

> Hi Steven,
>
> On Mon, Mar 7, 2016 at 5:03 PM, Steven Combs 
> wrote:
>
>>
>> I am developer for Rosetta . Recently
>> we have incorporated RDKit into our software suite for testing. In
>> particular, my project is to include drug design elements into the game
>> Foldit . I have been using a separate
>> chemioinformatics tool for this process, but recently have decided to
>> switch to RDKit because it is so much easier to cross-compile the library
>> than the other chemioinformatics suite.
>>
>
> Welcome to the community, and thanks for letting us know what you're doing
> with the RDKit! It's great to hear that the toolkit is starting to be used
> within Rosetta.
>
>
>> I am using the C++ api for RDKit.
>>
>> My question is pretty simple. If I have two 3D fragments and want to
>> connect them at a specific atom, how would I go about doing this? I know
>> there are two functions that I can use in ChemTransforms.h,
>> replaceSubstructs and combineMols. I want to keep the 3D conformation of
>> mol1, but transform and combine mol2 to mol1 at a specific atom (ie replace
>> a hydrogen on the atom in mol1 and create a bond between the parent atom's
>> hydrogen to a specific atom on mol2).
>>
>> Is replaceSubstructs the way to go? I am not exactly sure how to set up
>> the query for this.
>>
>
> I feel like this question has come up before but I cannot, for the life of
> me, find the thread to point you to.
>
> I would start by transforming mol2 so that its atoms are in the positions
> you want them in when the molecules are combined. The convenient way to do
> this is with MolTransforms::transformMolsAtoms() and the approproriate
> RDGeom::Transform3D (
> http://rdkit.org/docs/cppapi/classRDGeom_1_1Transform3D.html#a5715778fa5c085ed42f47acd2649721c
> )
>
> Once mol2 is appropriately oriented, I'd convert it to an RWMol (if it's
> not one already) and remove the H on the atom that's going to bond to mol1.
> I'd convert mol1 into an RWMol (if it's not one already) and remove the H
> on the atom where I want to attach mol2.
> You can then call newMol = combineMols(mol1,mol2); and you've got a single
> molecule that has all atoms in it.
> The last step is to cast the result of combineMols() to an RWMol and add
> the bond between the molecules.
>
> I hope this helps, but please let me know if you need more details,
>
> -greg
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Greg Landrum
On Thu, Mar 10, 2016 at 2:04 PM, Brian Kelley  wrote:

> Yes, I actually exposed that function to Python  in Rdkit :)
>
> Be aware that the canonical rank and the output order aren't the same
> thing.  The rank is what is used during graph traversal, when making the
> smiles string, to choose what atom to go to next.  The output order is what
> atoms where output first, second, third in the output smiles string.  They
> are not necessarily the same.
>
> Both should, however, be unique for the input graph, but in either case
> explicit hydrogens should be added.
>

Exactly. If you just want a canonical ordering of the atoms, there is no
reason to generate the SMILES. You can just use Chem.CanonicalRankAtoms().

The "_smilesAtomOutputOrder" property used to be the only way to get a
canonical ordering. It is still useful if you care about the order of atoms
in the output SMILES, but is quite inefficient if all you want is a
canonical ordering.

-greg
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Brian Kelley
Yes, I actually exposed that function to Python  in Rdkit :)

Be aware that the canonical rank and the output order aren't the same thing.  
The rank is what is used during graph traversal, when making the smiles string, 
to choose what atom to go to next.  The output order is what atoms where output 
first, second, third in the output smiles string.  They are not necessarily the 
same.  

Both should, however, be unique for the input graph, but in either case 
explicit hydrogens should be added.

For reference:

order = Chem.CanonicalRankAtoms(m, includeChirality=True)

Is the function being discussed.

And as a bonus:

mol_ordered = Chem.RenumberAtoms(m, list(order))

Will make a copy in canonical atom order, but not canonical smiles output order.


Brian Kelley

> On Mar 10, 2016, at 7:36 AM, Maciek Wójcikowski  wrote:
> 
> Hi,
> 
> Few months back Greg has added CanonicalRankAtoms to rdkit.Chem after my 
> similar question.
> http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms
> 
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
> 
> 2016-03-10 13:18 GMT+01:00 Michal Krompiec :
>> Thanks a lot, this is exactly what I wanted.
>> Best regards,
>> Michal
>> 
>>> On 10 March 2016 at 12:13, Brian Kelley  wrote:
>>> The canonicalizer doesn't treat hydrogens any differently than any other 
>>> atom, but they have to be in the graph.  If you are starting from smiles, 
>>> simply add explicit hydrogens, python example below:
>>> 
>>> >>> from rdkit import Chem
>>> >>> m = Chem.MolFromSmiles("CC")
>>> >>> mh = Chem.AddHs(m)
>>> >>> Chem.MolToSmiles(mh)
>>> '[H]C([H])([H])C([H])([H])[H]'
>>> >>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))
>>> # safer non eval version...
>>> >>> order = mh.GetPropsAsDict(includePrivate=True, 
>>>   
>>> includeComputed=True)['_smilesAtomOutputOrder']
>>> >>> list(order)
>>> [2,0,3,4,1,5,6,7]
>>> >>> 
>>> 
>>> Not that the output order is from the context of the output smiles string, 
>>> i.e. order[0] is the index of the original atom index that was the outputs 
>>> first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx
>>> 
 On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec 
  wrote:
 Hello,
 I need a "canonical" method for generating atom indices for a given 
 molecule (with 3D coordinates, so the input is e.g. a mol file), for a 
 molecular descriptor which should be invariant with respect to atom 
 indexing. As I understand, canonical SMILES will give the same atom 
 indices for non-hydrogen atoms, but is there a way in RDKit to generate 
 unique indices for hydrogens as well?
 Best regards,
 Michal
 
 --
 Transform Data into Opportunity.
 Accelerate data analysis in your applications with
 Intel Data Analytics Acceleration Library.
 Click to learn more.
 http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> 
>> 
>> --
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Maciek Wójcikowski
Hi,

Few months back Greg has added CanonicalRankAtoms to rdkit.Chem after my
similar question.
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-03-10 13:18 GMT+01:00 Michal Krompiec :

> Thanks a lot, this is exactly what I wanted.
> Best regards,
> Michal
>
> On 10 March 2016 at 12:13, Brian Kelley  wrote:
>
>> The canonicalizer doesn't treat hydrogens any differently than any other
>> atom, but they have to be in the graph.  If you are starting from smiles,
>> simply add explicit hydrogens, python example below:
>>
>> >>> from rdkit import Chem
>>
>> >>> m = Chem.MolFromSmiles("CC")
>>
>> >>> mh = Chem.AddHs(m)
>>
>> >>> Chem.MolToSmiles(mh)
>>
>> '[H]C([H])([H])C([H])([H])[H]'
>>
>> >>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))
>>
>> # safer non eval version...
>>
>> >>> order = mh.GetPropsAsDict(includePrivate=True,
>>
>>
>> includeComputed=True)['_smilesAtomOutputOrder']
>>
>> >>> list(order)
>>
>> [2,0,3,4,1,5,6,7]
>>
>> >>>
>>
>> Not that the output order is from the context of the output smiles
>> string, i.e. order[0] is the index of the original atom index that was the
>> outputs first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx
>>
>> On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec <
>> michal.kromp...@gmail.com> wrote:
>>
>>> Hello,
>>> I need a "canonical" method for generating atom indices for a given
>>> molecule (with 3D coordinates, so the input is e.g. a mol file), for a
>>> molecular descriptor which should be invariant with respect to atom
>>> indexing. As I understand, canonical SMILES will give the same atom indices
>>> for non-hydrogen atoms, but is there a way in RDKit to generate unique
>>> indices for hydrogens as well?
>>> Best regards,
>>> Michal
>>>
>>>
>>> --
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Michal Krompiec
Thanks a lot, this is exactly what I wanted.
Best regards,
Michal

On 10 March 2016 at 12:13, Brian Kelley  wrote:

> The canonicalizer doesn't treat hydrogens any differently than any other
> atom, but they have to be in the graph.  If you are starting from smiles,
> simply add explicit hydrogens, python example below:
>
> >>> from rdkit import Chem
>
> >>> m = Chem.MolFromSmiles("CC")
>
> >>> mh = Chem.AddHs(m)
>
> >>> Chem.MolToSmiles(mh)
>
> '[H]C([H])([H])C([H])([H])[H]'
>
> >>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))
>
> # safer non eval version...
>
> >>> order = mh.GetPropsAsDict(includePrivate=True,
>
>
> includeComputed=True)['_smilesAtomOutputOrder']
>
> >>> list(order)
>
> [2,0,3,4,1,5,6,7]
>
> >>>
>
> Not that the output order is from the context of the output smiles string,
> i.e. order[0] is the index of the original atom index that was the outputs
> first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx
>
> On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec <
> michal.kromp...@gmail.com> wrote:
>
>> Hello,
>> I need a "canonical" method for generating atom indices for a given
>> molecule (with 3D coordinates, so the input is e.g. a mol file), for a
>> molecular descriptor which should be invariant with respect to atom
>> indexing. As I understand, canonical SMILES will give the same atom indices
>> for non-hydrogen atoms, but is there a way in RDKit to generate unique
>> indices for hydrogens as well?
>> Best regards,
>> Michal
>>
>>
>> --
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Brian Kelley
The canonicalizer doesn't treat hydrogens any differently than any other
atom, but they have to be in the graph.  If you are starting from smiles,
simply add explicit hydrogens, python example below:

>>> from rdkit import Chem

>>> m = Chem.MolFromSmiles("CC")

>>> mh = Chem.AddHs(m)

>>> Chem.MolToSmiles(mh)

'[H]C([H])([H])C([H])([H])[H]'

>>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))

# safer non eval version...

>>> order = mh.GetPropsAsDict(includePrivate=True,


includeComputed=True)['_smilesAtomOutputOrder']

>>> list(order)

[2,0,3,4,1,5,6,7]

>>>

Not that the output order is from the context of the output smiles string,
i.e. order[0] is the index of the original atom index that was the outputs
first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx

On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec 
wrote:

> Hello,
> I need a "canonical" method for generating atom indices for a given
> molecule (with 3D coordinates, so the input is e.g. a mol file), for a
> molecular descriptor which should be invariant with respect to atom
> indexing. As I understand, canonical SMILES will give the same atom indices
> for non-hydrogen atoms, but is there a way in RDKit to generate unique
> indices for hydrogens as well?
> Best regards,
> Michal
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Michal Krompiec
Hello,
I need a "canonical" method for generating atom indices for a given
molecule (with 3D coordinates, so the input is e.g. a mol file), for a
molecular descriptor which should be invariant with respect to atom
indexing. As I understand, canonical SMILES will give the same atom indices
for non-hydrogen atoms, but is there a way in RDKit to generate unique
indices for hydrogens as well?
Best regards,
Michal
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss