Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Francois Berenger

Hello,

Maybe you can use this:
Chem.MolToSmiles(mol, allHsExplicit=True)

This will place each heavy atom between '[' and ']' and give you the 
number

of hydrogens for each.
It get easier to work with SMILES strings after this (you don't need 
anymore

a full blown SMILES parser).

Regards,
F.

On 09/05/2023 14:55, Haijun Feng wrote:

[1]

Hi All,

I am trying to add atom numbers in smiles as belows,

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
smi=Chem.MolToSmiles(mol)
print(smi)

the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]

then I want to split the smiles into atoms, I did it like this:

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
  print(i,atom.GetSymbol())

the output is:

0 C
1 C
2 C
3 C
4 C
5 C
6 C
7 N
8 O

But what I do want is something like this (with fragments instead of
atoms):

0 cH
1 CH
...
7 NH2
8 O

Can anyone help me figure out how to get each atom with H from the
smiles as above. Thanks so much!

best,

Hal

Links:
--
[1] https://stackoverflow.com/posts/76197437/timeline
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Andrew Dalke
On May 9, 2023, at 07:55, Haijun Feng  wrote:
> Can anyone help me figure out how to get each atom with H from the smiles as 
> above. Thanks so much!

Try using Chem.MolFragmentToSmiles to get the SMILES for each atom, with all 
hydrogens explicit, then strip off the leading and trailing []s.

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, 
atomsToUse=[atom.GetIdx()])
  print(i, atom_smi.strip("[]"))

This prints

0 cH
1 cH
2 cH
3 cH
4 cH
5 c
6 C
7 NH2
8 O

Your code showed you using

   atom.SetProp('molAtomMapNumber',str(i))

In the following, I'll set that property *after* getting the atom SMILES, so 
the map is not included as part of the output:

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, 
atomsToUse=[atom.GetIdx()])
  print(i, atom_smi.strip("[]"))
  atom.SetIntProp("molAtomMapNumber", i)

print(Chem.MolToSmiles(mol))

which gives the output

0 cH
1 cH
2 cH
3 cH
4 cH
5 c
6 C
7 NH2
8 O
[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]



> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]

For what it's worth, I get the slightly different:

   [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8] 

You should be aware that the input order and the output SMILES order might be 
different.

Because of the simpler structure of your preferred output SMILES format, you 
can alternatively extract the atom terms from the output string by looking for 
the substrings inside of the []s, as in the following:

import re
>>> re.compile(r'\[[^]]+\]').findall("[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]")
['[cH:0]', '[cH:1]', '[cH:2]', '[cH:3]', '[cH:4]', '[c:5]', '[C:6]', '[NH2:7]', 
'[O:8]']

This list will exactly match the output SMILES atom order.

Cheers,


Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Wim Dehaen
Hi,
I think if you simply need H and the H count appended it is by far the
easiest by just appending it to the symbol string. See the codeblock below:

def get_symbol_with_Hs(a):
symbol=a.GetSymbol()
charge=a.GetFormalCharge()
hcount=a.GetTotalNumHs()
if hcount > 0:
symbol+="H"
if hcount > 1:
symbol+=str(hcount)
if charge==1:
symbol+="+"
if charge==-1:
symbol+="-"
if charge > 1:
symbol+=f"(+{charge})"
if charge < -1:
symbol+=f"(-{charge})"
return symbol

mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
atom.SetProp('molAtomMapNumber',str(i))
print(i,get_symbol_with_Hs(atom))

-
another way I would recommend is using smiles and explicit hydrogens (i.e.
bracketed) instead. For your use case I would imagine this as follows:
from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
mol=Chem.AddHs(mol)
rwmol=Chem.RWMol(mol)
for b in list(rwmol.GetBonds()):
ba=b.GetBeginAtom()
ea=b.GetEndAtom()
if ba.GetAtomicNum()!=1 and ea.GetAtomicNum()!=1:
rwmol.RemoveBond(ba.GetIdx(),ea.GetIdx())
frags=Chem.GetMolFrags(rwmol, asMols=True,sanitizeFrags=False)
for i,f in enumerate(frags):
print(i,Chem.MolToSmiles(f))

this would output

0 [H]c
1 [H]c
2 [H]c
3 [H]c
4 [H]c
5 c
6 C
7 [H]N[H]
8 O


i hope that helps.

best wishes
wim

On Tue, May 9, 2023 at 7:58 AM Haijun Feng  wrote:

>
> 
>
> Hi All,
>
> I am trying to add atom numbers in smiles as belows,
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
>   atom.SetProp('molAtomMapNumber',str(i))
> smi=Chem.MolToSmiles(mol)
> print(smi)
>
> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]
>
> then I want to split the smiles into atoms, I did it like this:
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
>   atom.SetProp('molAtomMapNumber',str(i))
>   print(i,atom.GetSymbol())
>
> the output is:
>
> 0 C
> 1 C
> 2 C
> 3 C
> 4 C
> 5 C
> 6 C
> 7 N
> 8 O
>
> *But what I do want is something like this (with fragments instead of
> atoms): *
>
>
>
>
>
>
> *0 cH1 CH...7 NH28 O  *
>
> Can anyone help me figure out how to get each atom with H from the smiles
> as above. Thanks so much!
>
>
> best,
>
>
> Hal
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molfile from smiles

2023-05-09 Thread Santiago Fraga
Thank you for your answer.
I will try the new version.

Regards
Santiago

De: David Cosgrove 
Enviado: jueves, 4 de mayo de 2023 11:29
Para: Santiago Fraga 
Cc: Wim Dehaen ; RDKit Discuss 

Asunto: Re: [Rdkit-discuss] Molfile from smiles

As part of the work on improving the way RDKit handles organometallics that is 
in the latest release, there is MolOps::cleanUpOrganometallics, which attempts 
to do the bond transformations in a similar way to that gist.  The intention 
was that this would be part of the default sanitization, but late in the day it 
was discovered that it didn't work well with compounds with 2 metal atoms and 
bridging chlorine atoms, such as 'F[Pd]1(Cl)Cl->[Pd](Cl)(Cl)<-Cl1'.  It's my 
intention to fix that at some point in the near future, but in the meantime if 
you're working in C++ it is available for use with caveats.  Worth a try in 
this case, perhaps.
Dave


On Thu, May 4, 2023 at 9:51 AM Santiago Fraga 
mailto:santi...@mestrelab.com>> wrote:
Good morning Wim
  Yes, I know that the original smiles has problems with the dative bonds.
  I am trying to load the molecule and then fix those bonds using this 
solution:
  https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4

  And then generate a new molfile. I will try to apply your code to see if 
I can improve the molecule
  depiction.

 Regards
 Santiago

[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg]
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]

SANTIAGO FRAGA
Software Developer
santi...@mestrelab.com

MESTRELAB RESEARCH S.L.
PHONE +34881976775
FAX +34981941079
Feliciano Barrera, 9B-Bajo 15706
Santiago de Compostela (SPAIN)

Follow us:
[Mestrelab Twitter]  [Mestrelab Linkedin] 
   [Canal de YouTube 
Mestrelab]    
[MestreBlog] 




De: Wim Dehaen mailto:wimdeh...@gmail.com>>
Enviado: martes, 2 de mayo de 2023 21:37
Para: Santiago Fraga mailto:santi...@mestrelab.com>>
Cc: Ling Chan mailto:lingtrek...@gmail.com>>; RDKit 
Discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Asunto: Re: [Rdkit-discuss] Molfile from smiles

Hi all,
unfortunately I can't offer a "fix" but I can offer these minor comments:
-it seems like the SMILES has some parsing error. You can make uses of RDKits 
extension for dative bonds in SMILES ("->") and replace the SMILES with the 
below, which will parse, and give (what i assume is) the intended structure:
"C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1"
-more fundamentally, i think the reason this molecule is hard to render is 
because, as a hexavalent iridium complex it is more fundamentally 3-dimensional 
and therefore tougher to sketch. you can see here on wikipedia Ir(ppy)3 even 
when manually sketched looks a bit funny:
https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png
-in general, organometallic species have various limitations when it comes to 
their handling by cheminformatics packages. for this reason, some care is 
needed when dealing with species like this to make sure you won't have issues 
down the line. an overview of some rdkit related ones see this presentation by 
prof jan jensen: 
https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf

Finally, if i embed the molecule and then display its 2D projection, it 
actually looks pretty good (despite a warning UFF doesnt recognize iridium). 
See below:
[image.png]
This was generated using the following codeblock (in Python, not C++, sorry for 
that):

mol = 
Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol,randomSeed=0xf00d)
mol = Chem.RemoveHs(mol)
display(mol)

best wishes
wim

On Tue, May 2, 2023 at 5:06 PM Santiago Fraga 
mailto:santi...@mestrelab.com>> wrote:
Thanks for your answer, Ling Chan.
But I am already using that option with the C++ API.

Regards
Santiago

[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg]
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]

SANTIAGO FRAGA
Software Developer
santi...@mestrelab.com

MESTRELAB RESEARCH S.L.
PHONE +34881976775
FAX +34981941079
Feliciano Barrera, 9B-Bajo 15706
Santiago de Compostela (SPAIN)

Follow us:
[Mestrelab Twitter]  [Mestrelab Linkedin] 
   [Canal de YouTube 
Mestrelab]    
[MestreBlog]