Re: [Rdkit-discuss] question about morgan bits

2021-03-11 Thread Wendong Wang
Great. Thanks, Greg.

Best wishes,
Wendong

On Fri, Mar 12, 2021 at 2:55 PM Greg Landrum  wrote:

>
> On Fri, Mar 12, 2021 at 7:48 AM Wendong Wang 
> wrote:
>
>> PS. By the way, how did you index all the atoms when you draw the
>> molecule?
>>
>
> MolDraw2D has a drawing option to add atom indices.
> If you're using the RDKit in Jupyer with IPythonConsole, you can do:
> IPythonConsole.drawOptions.addAtomIndices=True
> [image: image.png]
>
>
>
>  -greg
>
> On Fri, Mar 12, 2021 at 2:28 PM Wendong Wang 
>> wrote:
>>
>>> Hi Greg,
>>> Thanks for the explanation and the references.
>>>
>>> Best wishes,
>>> Wendong
>>>
>>> On Fri, Mar 12, 2021 at 1:58 PM Greg Landrum 
>>> wrote:
>>>
 Hi Wendong,

 The morgan fingerprint algorithm removes redundant atom environments
 (environments which contain exactly the same atoms/bonds).
 For example, when looking at valine:
 [image: image.png]
 The environments with radius 2 which are centered on atoms 5 and 6 are
 redundant with the environment of radius 1 which is centered on atom 4, so
 those environments are not reported in the output.

 This is described in more detail in the ECFP paper: Rogers, D.; Hahn,
 M. “Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.*
 **50**:742-54 (2010).

 Best,
 -greg

 On Fri, Mar 12, 2021 at 4:16 AM Wendong Wang 
 wrote:

> Greetings,
> I have a question about morgan fingerprints. The code is pasted at the
> end of the email, and please see the attached images for the results.
>
> For valine molecule, the radius is set to be 2. The dictionary (atom
> index, radius) shows all the substructures of all atoms with radius 0 as
> fingerprints, and all the substructures of all the atoms with radius 1 as
> fingerprints. But there are only a few substructures with radius 2 as
> fingerprints. Why so few?
>
> Thanks.
>
> Best wishes,
> Wendong
>
> PS. The code is below:
> m1 = Chem.MolFromSmiles('CC(C)[C@@H](C(=O)O)N')
> di1 = {}
> fp1 = AllChem.GetMorganFingerprintAsBitVect(m1, radius = 2, nBits =
> 2048, bitInfo = di1)
> tu1 = [(m1, x, di1) for x in fp1.GetOnBits()]
> Draw.DrawMorganBits(tu1, molsPerRow = 4, legends=[str(x) for x in
> fp1.GetOnBits()])
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question about morgan bits

2021-03-11 Thread Greg Landrum
On Fri, Mar 12, 2021 at 7:48 AM Wendong Wang 
wrote:

> PS. By the way, how did you index all the atoms when you draw the
> molecule?
>

MolDraw2D has a drawing option to add atom indices.
If you're using the RDKit in Jupyer with IPythonConsole, you can do:
IPythonConsole.drawOptions.addAtomIndices=True
[image: image.png]



 -greg

On Fri, Mar 12, 2021 at 2:28 PM Wendong Wang 
> wrote:
>
>> Hi Greg,
>> Thanks for the explanation and the references.
>>
>> Best wishes,
>> Wendong
>>
>> On Fri, Mar 12, 2021 at 1:58 PM Greg Landrum 
>> wrote:
>>
>>> Hi Wendong,
>>>
>>> The morgan fingerprint algorithm removes redundant atom environments
>>> (environments which contain exactly the same atoms/bonds).
>>> For example, when looking at valine:
>>> [image: image.png]
>>> The environments with radius 2 which are centered on atoms 5 and 6 are
>>> redundant with the environment of radius 1 which is centered on atom 4, so
>>> those environments are not reported in the output.
>>>
>>> This is described in more detail in the ECFP paper: Rogers, D.; Hahn, M.
>>> “Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.*
>>> **50**:742-54 (2010).
>>>
>>> Best,
>>> -greg
>>>
>>> On Fri, Mar 12, 2021 at 4:16 AM Wendong Wang 
>>> wrote:
>>>
 Greetings,
 I have a question about morgan fingerprints. The code is pasted at the
 end of the email, and please see the attached images for the results.

 For valine molecule, the radius is set to be 2. The dictionary (atom
 index, radius) shows all the substructures of all atoms with radius 0 as
 fingerprints, and all the substructures of all the atoms with radius 1 as
 fingerprints. But there are only a few substructures with radius 2 as
 fingerprints. Why so few?

 Thanks.

 Best wishes,
 Wendong

 PS. The code is below:
 m1 = Chem.MolFromSmiles('CC(C)[C@@H](C(=O)O)N')
 di1 = {}
 fp1 = AllChem.GetMorganFingerprintAsBitVect(m1, radius = 2, nBits =
 2048, bitInfo = di1)
 tu1 = [(m1, x, di1) for x in fp1.GetOnBits()]
 Draw.DrawMorganBits(tu1, molsPerRow = 4, legends=[str(x) for x in
 fp1.GetOnBits()])

 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

>>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question about morgan bits

2021-03-11 Thread Greg Landrum
Hi Wendong,

The morgan fingerprint algorithm removes redundant atom environments
(environments which contain exactly the same atoms/bonds).
For example, when looking at valine:
[image: image.png]
The environments with radius 2 which are centered on atoms 5 and 6 are
redundant with the environment of radius 1 which is centered on atom 4, so
those environments are not reported in the output.

This is described in more detail in the ECFP paper: Rogers, D.; Hahn, M.
“Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.*
**50**:742-54 (2010).

Best,
-greg

On Fri, Mar 12, 2021 at 4:16 AM Wendong Wang 
wrote:

> Greetings,
> I have a question about morgan fingerprints. The code is pasted at the end
> of the email, and please see the attached images for the results.
>
> For valine molecule, the radius is set to be 2. The dictionary (atom
> index, radius) shows all the substructures of all atoms with radius 0 as
> fingerprints, and all the substructures of all the atoms with radius 1 as
> fingerprints. But there are only a few substructures with radius 2 as
> fingerprints. Why so few?
>
> Thanks.
>
> Best wishes,
> Wendong
>
> PS. The code is below:
> m1 = Chem.MolFromSmiles('CC(C)[C@@H](C(=O)O)N')
> di1 = {}
> fp1 = AllChem.GetMorganFingerprintAsBitVect(m1, radius = 2, nBits = 2048,
> bitInfo = di1)
> tu1 = [(m1, x, di1) for x in fp1.GetOnBits()]
> Draw.DrawMorganBits(tu1, molsPerRow = 4, legends=[str(x) for x in
> fp1.GetOnBits()])
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] chair vs boat detection

2021-03-11 Thread Greg Landrum
Hi Ling,

The RDKit does not currently have such a function (at least not that I'm
aware of).

-greg


On Fri, Mar 12, 2021 at 6:13 AM Ling Chan  wrote:

> Hello colleagues,
>
> Just wonder if there is any function to distinguish between a chair ring
> and a boat ring?
>
> Don't worry if there is no such utility. I can write my own geometry
> detection. Just that I don't want to reinvent the wheel.
>
> Thank you.
>
> Ling
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] chair vs boat detection

2021-03-11 Thread Ling Chan
Hello colleagues,

Just wonder if there is any function to distinguish between a chair ring
and a boat ring?

Don't worry if there is no such utility. I can write my own geometry
detection. Just that I don't want to reinvent the wheel.

Thank you.

Ling
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] question about morgan bits

2021-03-11 Thread Wendong Wang
Greetings,
I have a question about morgan fingerprints. The code is pasted at the end
of the email, and please see the attached images for the results.

For valine molecule, the radius is set to be 2. The dictionary (atom index,
radius) shows all the substructures of all atoms with radius 0 as
fingerprints, and all the substructures of all the atoms with radius 1 as
fingerprints. But there are only a few substructures with radius 2 as
fingerprints. Why so few?

Thanks.

Best wishes,
Wendong

PS. The code is below:
m1 = Chem.MolFromSmiles('CC(C)[C@@H](C(=O)O)N')
di1 = {}
fp1 = AllChem.GetMorganFingerprintAsBitVect(m1, radius = 2, nBits = 2048,
bitInfo = di1)
tu1 = [(m1, x, di1) for x in fp1.GetOnBits()]
Draw.DrawMorganBits(tu1, molsPerRow = 4, legends=[str(x) for x in
fp1.GetOnBits()])
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Error in RDKit output for finding ring atoms!

2021-03-11 Thread Ling Chan
Hello Goutam,
There are not even 30 atoms in your smiles string. This is because, for
example, [C@H] denotes one carbon atom. The H is there to describe the C.
It is not an atom in the smiles string.
Ling


Goutam Mukherjee  於 2021年3月11日週四 上午8:35寫道:

> Dear Members,
>
> I have found an error in RDKit output. I am not sure whether it is my
> mistake.
> I have a SMILES code of  a molecule:
> C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@
> @H]1O)N1C=NC2=C1N=CN=C2N
>
> the 3D coordinates of the molecule is attached here with.
>
> *I ran the following command:*
>
>
>
>
>
>
>
> *In [1]: from rdkit import ChemIn [2]: m =
> Chem.MolFromSmiles('C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1N=CN=C2N')In
> [3]: ri = m.GetRingInfo()In [4]: print(ri.AtomRings())((10, 11, 12, 13,
> 15), (18, 17, 21, 20, 19), (22, 23, 24, 25, 20, 21))*
>
> Here the atom rank does not correspond to the ring atom ranks.
> This molecule contains two five members and one six member ring
> The true atom rank would be
> *{(11, 13, 14, 16, 19), (22, 23, 24, 25, 26), (25, 26, 27, 28, 29, 30)}*
>
> Could anyone please give me a solution how I get a corect atom ranks which
> are part of a ring.
>
> Thanks and Best Regards,
> Goutam
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Error in RDKit output for finding ring atoms!

2021-03-11 Thread Goutam Mukherjee
Dear Ivan,

Many thanks for your reply.
Yes, when I give 3D hydrogen added molecule as input (molecule.pdb) the
result remains the same.
Say, if I give benzene molecule (C1=CC=CC=C1) as an input, or other complex
molecule, it prints the correct atom rank no matter whether hydrogen atoms
are there or not.
In [1]:  from rdkit import Chem
In [2]:  m = Chem.MolFromSmiles('C1=CC=CC=C1')
In [3]: ri = m.GetRingInfo()
In [4]:  print(ri.AtomRings())
*((0, 5, 4, 3, 2, 1),)*


Thank and Best Regards,
Goutam


On Thu, Mar 11, 2021 at 5:49 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi Goutam,
>
> The ring atoms reported by RDKit in your example are correct; you just
> need to consider that the atom indexes correspond to the position of each
> atom in the SMILES string. How could RDKit guess the index that the atom
> might have in a PDB file that's not even being read in your example?
>
> I'm guessing maybe in your real use case you did read the PDB file. It is
> possible that the atoms got renumbered, for example if the hydrogens were
> deleted in the process.
>
> Hope this helps,
> Ivan
>
> On Thu, Mar 11, 2021 at 11:35 AM Goutam Mukherjee 
> wrote:
>
>> Dear Members,
>>
>> I have found an error in RDKit output. I am not sure whether it is my
>> mistake.
>> I have a SMILES code of  a molecule:
>> C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@
>> @H]1O)N1C=NC2=C1N=CN=C2N
>>
>> the 3D coordinates of the molecule is attached here with.
>>
>> *I ran the following command:*
>>
>>
>>
>>
>>
>>
>>
>> *In [1]: from rdkit import ChemIn [2]: m =
>> Chem.MolFromSmiles('C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1N=CN=C2N')In
>> [3]: ri = m.GetRingInfo()In [4]: print(ri.AtomRings())((10, 11, 12, 13,
>> 15), (18, 17, 21, 20, 19), (22, 23, 24, 25, 20, 21))*
>>
>> Here the atom rank does not correspond to the ring atom ranks.
>> This molecule contains two five members and one six member ring
>> The true atom rank would be
>> *{(11, 13, 14, 16, 19), (22, 23, 24, 25, 26), (25, 26, 27, 28, 29, 30)}*
>>
>> Could anyone please give me a solution how I get a corect atom ranks
>> which are part of a ring.
>>
>> Thanks and Best Regards,
>> Goutam
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Error in RDKit output for finding ring atoms!

2021-03-11 Thread Ivan Tubert-Brohman
Hi Goutam,

The ring atoms reported by RDKit in your example are correct; you just need
to consider that the atom indexes correspond to the position of each atom
in the SMILES string. How could RDKit guess the index that the atom might
have in a PDB file that's not even being read in your example?

I'm guessing maybe in your real use case you did read the PDB file. It is
possible that the atoms got renumbered, for example if the hydrogens were
deleted in the process.

Hope this helps,
Ivan

On Thu, Mar 11, 2021 at 11:35 AM Goutam Mukherjee 
wrote:

> Dear Members,
>
> I have found an error in RDKit output. I am not sure whether it is my
> mistake.
> I have a SMILES code of  a molecule:
> C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@
> @H]1O)N1C=NC2=C1N=CN=C2N
>
> the 3D coordinates of the molecule is attached here with.
>
> *I ran the following command:*
>
>
>
>
>
>
>
> *In [1]: from rdkit import ChemIn [2]: m =
> Chem.MolFromSmiles('C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1N=CN=C2N')In
> [3]: ri = m.GetRingInfo()In [4]: print(ri.AtomRings())((10, 11, 12, 13,
> 15), (18, 17, 21, 20, 19), (22, 23, 24, 25, 20, 21))*
>
> Here the atom rank does not correspond to the ring atom ranks.
> This molecule contains two five members and one six member ring
> The true atom rank would be
> *{(11, 13, 14, 16, 19), (22, 23, 24, 25, 26), (25, 26, 27, 28, 29, 30)}*
>
> Could anyone please give me a solution how I get a corect atom ranks which
> are part of a ring.
>
> Thanks and Best Regards,
> Goutam
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Error in RDKit output for finding ring atoms!

2021-03-11 Thread Goutam Mukherjee
Dear Members,

I have found an error in RDKit output. I am not sure whether it is my
mistake.
I have a SMILES code of  a molecule:
C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1N=CN=C2N

the 3D coordinates of the molecule is attached here with.

*I ran the following command:*







*In [1]: from rdkit import ChemIn [2]: m =
Chem.MolFromSmiles('C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1N=CN=C2N')In
[3]: ri = m.GetRingInfo()In [4]: print(ri.AtomRings())((10, 11, 12, 13,
15), (18, 17, 21, 20, 19), (22, 23, 24, 25, 20, 21))*

Here the atom rank does not correspond to the ring atom ranks.
This molecule contains two five members and one six member ring
The true atom rank would be
*{(11, 13, 14, 16, 19), (22, 23, 24, 25, 26), (25, 26, 27, 28, 29, 30)}*

Could anyone please give me a solution how I get a corect atom ranks which
are part of a ring.

Thanks and Best Regards,
Goutam


molecule.pdb
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] explicit H atoms

2021-03-11 Thread Jean-Marc Nuzillard

Hi,

it seems that the answer to my initial question depends on how the 
prescription:

"stereobonds between stereocenters should be avoided at all costs"
is followed.

Many thanks!

Jean-Marc


Le 10/03/2021 à 20:24, Ling Chan a écrit :

Hello Mark,

I thought you could depict it like the attached, since only the narrow 
end of a wedged bond counts. Sure, it is confusing, but it is doable. 
Except that in section ST-0.5 of the IUPAC guidelines pointed out by Greg
https://www.degruyter.com/document/doi/10.1351/pac200678101897/html 

it says that "stereobonds between stereocenters should be avoided at 
all costs".


Just that there are times when "stereobonds between stereocenters" are 
not avoidable, if all four neighbours of a chiral carbon are 
themselves chiral.


I guess for practical purposes having H's could make things clearer, 
but in theory you may not need them for chiral atoms.


Ling


Mark Mackey via Rdkit-discuss > 於 2021年3月10日週三 
上午2:33寫道:


I believe it's not possible to represent the chirality of the
attached molecule's ring fusion carbons without using an explicit H.

Regards,
Mark

--
Mark Mackey
Chief Scientific Officer
Cresset
New Cambridge House, Bassingbourn Road, Litlington,
Cambridgeshire, SG8 0SS, UK
tel: +44 (0)1223 858890    mobile: +44 (0)7595 099165    fax: +44
(0)1223 853667
email: m...@cresset-group.com   
web: www.cresset-group.com    skype:
mark_cresset



-Original Message-
From: Jean-Marc Nuzillard mailto:jm.nuzill...@univ-reims.fr>>
Sent: 08 March 2021 13:55
To: RDKit Discuss mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: [Rdkit-discuss] explicit H atoms

Dear all,

my question of the day is more general than directly related to
RDKit but the link is indirect.

Is it always possible to represent an organic molecule in 2D with
all necessary configuration hints (bond wedges pointing to the
front or to the back) without introducing any explicit hydrogen atom?

May be my question is very naive, all my apologies in advance for
that.

Best,

Jean-Marc

--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr 
http://eos.univ-reims.fr/LSD/CSNteam.html


http://www.univ-reims.fr/LSD/ 
http://www.univ-reims.fr/LSD/JmnSoft/




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


This email has been sent from Cresset BioMolecular Discovery
Limited, registered in England and Wales, Company Number:
04151475. The information in this email and any attachments are
confidential and may be privileged. It is intended solely for the
addressee and access to this email by anyone else is unauthorised.
If an addressing or transmission error has misdirected this email,
please notify the author by replying to this email. If you are not
the intended recipient you must not use, disclose, distribute,
store or copy the information in any medium. Although this e-mail
and any attachments are believed to be free from any virus or
other defect which might affect any system into which they are
opened or received, it is the responsibility of the recipient to
check that they are virus-free and that they will in no way affect
systems and data. No responsibility is accepted by Cresset
BioMolecular Discovery Limited for any loss or damage arising in
any way from their receipt, opening or use. Privacy
notice>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html