Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Francois Berenger

Hello,

Maybe you can use this:
Chem.MolToSmiles(mol, allHsExplicit=True)

This will place each heavy atom between '[' and ']' and give you the 
number

of hydrogens for each.
It get easier to work with SMILES strings after this (you don't need 
anymore

a full blown SMILES parser).

Regards,
F.

On 09/05/2023 14:55, Haijun Feng wrote:

[1]

Hi All,

I am trying to add atom numbers in smiles as belows,

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
smi=Chem.MolToSmiles(mol)
print(smi)

the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]

then I want to split the smiles into atoms, I did it like this:

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
  print(i,atom.GetSymbol())

the output is:

0 C
1 C
2 C
3 C
4 C
5 C
6 C
7 N
8 O

But what I do want is something like this (with fragments instead of
atoms):

0 cH
1 CH
...
7 NH2
8 O

Can anyone help me figure out how to get each atom with H from the
smiles as above. Thanks so much!

best,

Hal

Links:
--
[1] https://stackoverflow.com/posts/76197437/timeline
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Andrew Dalke
On May 9, 2023, at 07:55, Haijun Feng  wrote:
> Can anyone help me figure out how to get each atom with H from the smiles as 
> above. Thanks so much!

Try using Chem.MolFragmentToSmiles to get the SMILES for each atom, with all 
hydrogens explicit, then strip off the leading and trailing []s.

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, 
atomsToUse=[atom.GetIdx()])
  print(i, atom_smi.strip("[]"))

This prints

0 cH
1 cH
2 cH
3 cH
4 cH
5 c
6 C
7 NH2
8 O

Your code showed you using

   atom.SetProp('molAtomMapNumber',str(i))

In the following, I'll set that property *after* getting the atom SMILES, so 
the map is not included as part of the output:

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, 
atomsToUse=[atom.GetIdx()])
  print(i, atom_smi.strip("[]"))
  atom.SetIntProp("molAtomMapNumber", i)

print(Chem.MolToSmiles(mol))

which gives the output

0 cH
1 cH
2 cH
3 cH
4 cH
5 c
6 C
7 NH2
8 O
[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]



> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]

For what it's worth, I get the slightly different:

   [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8] 

You should be aware that the input order and the output SMILES order might be 
different.

Because of the simpler structure of your preferred output SMILES format, you 
can alternatively extract the atom terms from the output string by looking for 
the substrings inside of the []s, as in the following:

import re
>>> re.compile(r'\[[^]]+\]').findall("[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]")
['[cH:0]', '[cH:1]', '[cH:2]', '[cH:3]', '[cH:4]', '[c:5]', '[C:6]', '[NH2:7]', 
'[O:8]']

This list will exactly match the output SMILES atom order.

Cheers,


Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Wim Dehaen
Hi,
I think if you simply need H and the H count appended it is by far the
easiest by just appending it to the symbol string. See the codeblock below:

def get_symbol_with_Hs(a):
symbol=a.GetSymbol()
charge=a.GetFormalCharge()
hcount=a.GetTotalNumHs()
if hcount > 0:
symbol+="H"
if hcount > 1:
symbol+=str(hcount)
if charge==1:
symbol+="+"
if charge==-1:
symbol+="-"
if charge > 1:
symbol+=f"(+{charge})"
if charge < -1:
symbol+=f"(-{charge})"
return symbol

mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
atom.SetProp('molAtomMapNumber',str(i))
print(i,get_symbol_with_Hs(atom))

-
another way I would recommend is using smiles and explicit hydrogens (i.e.
bracketed) instead. For your use case I would imagine this as follows:
from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
mol=Chem.AddHs(mol)
rwmol=Chem.RWMol(mol)
for b in list(rwmol.GetBonds()):
ba=b.GetBeginAtom()
ea=b.GetEndAtom()
if ba.GetAtomicNum()!=1 and ea.GetAtomicNum()!=1:
rwmol.RemoveBond(ba.GetIdx(),ea.GetIdx())
frags=Chem.GetMolFrags(rwmol, asMols=True,sanitizeFrags=False)
for i,f in enumerate(frags):
print(i,Chem.MolToSmiles(f))

this would output

0 [H]c
1 [H]c
2 [H]c
3 [H]c
4 [H]c
5 c
6 C
7 [H]N[H]
8 O


i hope that helps.

best wishes
wim

On Tue, May 9, 2023 at 7:58 AM Haijun Feng  wrote:

>
> 
>
> Hi All,
>
> I am trying to add atom numbers in smiles as belows,
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
>   atom.SetProp('molAtomMapNumber',str(i))
> smi=Chem.MolToSmiles(mol)
> print(smi)
>
> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]
>
> then I want to split the smiles into atoms, I did it like this:
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
>   atom.SetProp('molAtomMapNumber',str(i))
>   print(i,atom.GetSymbol())
>
> the output is:
>
> 0 C
> 1 C
> 2 C
> 3 C
> 4 C
> 5 C
> 6 C
> 7 N
> 8 O
>
> *But what I do want is something like this (with fragments instead of
> atoms): *
>
>
>
>
>
>
> *0 cH1 CH...7 NH28 O  *
>
> Can anyone help me figure out how to get each atom with H from the smiles
> as above. Thanks so much!
>
>
> best,
>
>
> Hal
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-08 Thread Haijun Feng


Hi All,

I am trying to add atom numbers in smiles as belows,

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
smi=Chem.MolToSmiles(mol)
print(smi)

the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]

then I want to split the smiles into atoms, I did it like this:

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
  print(i,atom.GetSymbol())

the output is:

0 C
1 C
2 C
3 C
4 C
5 C
6 C
7 N
8 O

*But what I do want is something like this (with fragments instead of
atoms): *






*0 cH1 CH...7 NH28 O  *

Can anyone help me figure out how to get each atom with H from the smiles
as above. Thanks so much!


best,


Hal
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss