Re: [Rdkit-discuss] Hydrogens not recognised as Dummy Atoms?

2021-07-09 Thread Adelene LAI
Makes sense, thank you Ivan!

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
https://adelenel.ai











From: Ivan Tubert-Brohman 
Sent: 08 July 2021 19:02:10
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] Hydrogens not recognised as Dummy Atoms?

Hi Adelene,

You can't match an atom that doesn't exist as a node in the molecular graph, so 
if you really want to match a hydrogen, you'll have to add explicit hydrogens 
to your molecule:

molh = Chem.AddHs(mol)
molh.HasSubstructMatch(q1)
> True

However, if all you want to know is whether the oxygen is next to a hydrogen, 
you can make the hydrogen count a property of the oxygen atom by using SMARTS:

q3s = 'CCOCCOCCC[OH]'
q3 = Chem.MolFromSmarts(q3s)
mol.HasSubstructMatch(q3)
> True

Hope this helps,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Hydrogens not recognised as Dummy Atoms?

2021-07-08 Thread Adelene LAI
Hi RDKit Community,


I've observed that hydrogens are not recognised as dummy atoms when trying to 
do substructure matching.


Is there a way to make them so? Ideally without adding explicitHs.


-


smi = 'CCOCCOCCCO'
mol = Chem.MolFromSmiles(smi)

#with dummy atom
q1s = 'CCOCCOCCCO*'
q1m = Chem.MolFromSmiles(q1s)
q1 = Chem.AdjustQueryProperties(q1m) # MolToSmarts gives 
[#6]-[#6]-[#8]-[#6]-[#6]-[#8]-[#6]-[#6]-[#6]-[#8]-*
mol.HasSubstructMatch(q1)
>FALSE

#without dummy atom
q2s = 'CCOCCOCCCO'
q2m = Chem.MolFromSmiles(q2s)
q2 = Chem.AdjustQueryProperties(q2m)# MolToSmarts gives 
[#6]-[#6]-[#8]-[#6]-[#6]-[#8]-[#6]-[#6]-[#6]-[#8]
mol.HasSubstructMatch(q2)
>TRUE



Thanks!
Adelene




Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
https://adelenel.ai









___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Error capturing: Chem.WrapLogs unexpected result

2021-06-18 Thread Adelene LAI
Hi RDKit Community,


I'm trying to run a script in the command line without having any RDKit 
warnings or errors show up in the CL. Instead, I want them written into a 
log.txt



from rdkit import Chem
from contextlib import redirect_stderr

Chem.WrapLogs()

with open('log.txt', 'w') as f:
with redirect_stderr(f):
mol = Chem.MolFromSmiles("c1c")



The error does indeed get written to the log.txt file (I'm assuming warnings 
would be too).

What is strange is that the stderr still shows up in the command line, even 
though I'd already redirected it to the log.txt.

Does this mean that there are two stderr streams? How is this possible?


I've been reading several old posts on this topic, but none really fits my 
problem:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAOC-GK0oTH36vvL7eVyWMJg4zmERpqctonrgNnxG10QmgYXhdg%40mail.gmail.com/#msg36030331

https://sourceforge.net/p/rdkit/mailman/message/33261506/  <- addressed by 
WrapLogs() I believe

http://rdkit.blogspot.com/2016/03/capturing-error-information.html

https://github.com/rdkit/rdkit/pull/739

Would appreciate any ideas.

Thanks,
Adelene









Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
https://adelenel.ai









___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Removing hydrogen atoms without neighbors

2021-01-21 Thread Adelene LAI
Hi Navid,


Could you give an example?


Are you trying to remove disconnected H-atoms from the mol object?


A

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai











From: Navid Shervani-Tabar 
Sent: 20 January 2021 17:36:14
To: RDKit Discuss
Subject: [Rdkit-discuss] Removing hydrogen atoms without neighbors

Dear all,

I was wondering if there is a function to remove "hydrogen atoms without 
neighbors" from the mol object. Thanks!

Regards,
Navid
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Simple question about double bond stereo in molblock output

2020-12-29 Thread Adelene LAI
Hi James,


Interesting problem!


You are right that the bond you are interested in has STEREONONE, but I am not 
sure that STEREONONE necessarily translates to having a bond stereo value of 0. 
(philosophical question?)


https://gist.github.com/adelenelai/0e2c4c90f33bac9197d7a11495b4f164


Like the example of FC=CF I showed (undefined stereochemistry), it could be 
that having symmetry on the double bond is treated as a case of undefined 
stereochemistry, and hence double bond stereo value = 3.


Warm regards,


Adelene




Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


Campus Belval | Luxembourg Centre for Systems Biomedicine

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai











From: James Davidson 
Sent: Tuesday, December 22, 2020 11:46:32 AM
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Simple question about double bond stereo in molblock 
output


Dear All,



I wonder if I can quickly sanity-check something(?).



I have noticed that symmetrical double bonds output with a bond stereo setting 
of “3” (cis or trans (either) double bond) in the standard molblock output.

Is this expected/intentional?  I would have expected a setting of “0” (use 
coords to determine cis or trans) for a non-stereo double bond.

(I am using 2020.09.1)



Here’s a simple example:



m = Chem.MolFromSmiles('FC(F)=CC1=CC=CC=C1')

print(Chem.MolToMolBlock(m))





 RDKit  2D



10 10  0  0  0  0  0  0  0  0999 V2000

5.2500   -1.29900. F   0  0  0  0  0  0  0  0  0  0  0  0

3.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0

3.   -2.59810. F   0  0  0  0  0  0  0  0  0  0  0  0

3.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0

1.50000.0. C   0  0  0  0  0  0  0  0  0  0  0  0

0.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0

   -0.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0

   -1.50000.0. C   0  0  0  0  0  0  0  0  0  0  0  0

   -0.75001.29900. C   0  0  0  0  0  0  0  0  0  0  0  0

0.75001.29900. C   0  0  0  0  0  0  0  0  0  0  0  0

  1  2  1  0

  2  3  1  0

  2  4  2  3

  4  5  1  0

  5  6  2  0

  6  7  1  0

  7  8  2  0

  8  9  1  0

  9 10  2  0

10  5  1  0

M  END





This behaviour is maybe what I would expect if the bond was explicitly set 
using bond.SetStereo(Chem.BondStereo.STEREOANY), but in the absence of this I 
would expect the bond to default to STEREONONE, and I guess I would expect this 
to be bond stereo “0” in the output molblock.  What am I missing?



Kind regards



James



PLEASE READ - This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

Vernalis (R) Limited (no. 1985479)
Granta Park, Great Abington
Cambridge, CB21 6GB, United Kingdom
Tel: +44 (0)1223 895 555

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Partial substructure match?

2020-11-20 Thread Adelene LAI
Hi Gustavo,


Doesn't the substructure match only works for the whole substructure,  as an 
all-or-nothing?

Is it possible to get a partial match with substructure search?


>> I think it would depend on how you specify your query SMARTS. There's 
>> probably a way to do partial SS using SMARTS, but really depends on your 
>> goal and what you want to 'allow' as a match.


MCSS might be the better way to go but at this point, more example molecules 
and 'allowed matches' would help :)



A




Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai











From: Rajarshi Guha 
Sent: Friday, November 20, 2020 4:16:11 PM
To: Gustavo Seabra
Cc: Dan Nealschneider; Adelene LAI; RDKit Discuss
Subject: Re: [Rdkit-discuss] Partial substructure match?

One approach could be to assign scoring functions for bond and atom matches 
(such as what OE 
supports<https://docs.eyesopen.com/toolkits/python/oechemtk/patternmatch.html#mcs-scoring-functions>)

On Fri, Nov 20, 2020 at 9:58 AM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Hi Adelene,

Doesn't the substructure match only works for the whole substructure,  as an 
all-or-nothing?

I suppose I could use the MCSS and count the number of matching atoms,  then 
calculate the percentage match myself.

Is it possible to get a partial match with substructure search?

Gustavo.

--
Gustavo Seabra

________
From: Adelene LAI mailto:adelene@uni.lu>>
Sent: Friday, November 20, 2020 9:13:15 AM
To: Dan Nealschneider 
mailto:dan.nealschnei...@schrodinger.com>>; 
Gustavo Seabra mailto:gustavo.sea...@gmail.com>>
Cc: RDKit Discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] Partial substructure match?


Hi Dan and Gustavo,


MCSS sounds good, but depends on the goal.


>From the way Gustavo wrote, it sounds like a Query-Target substructure search 
>- he has a list of targets and one specific query, and he wants to compare 
>matching rate amongst the members of the list.


If so, I would try query SMARTS.

https://www.rdkit.org/docs/GettingStartedInPython.html#substructure-searching


Regarding the % substructure match, interesting question. How would you 
quantify that? Not sure such a thing exists in RDKit right now.


Adelene


Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


Campus Belval | Luxembourg Centre for Systems Biomedicine

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai











From: Dan Nealschneider 
mailto:dan.nealschnei...@schrodinger.com>>
Sent: Thursday, November 19, 2020 6:01:37 PM
To: Gustavo Seabra
Cc: RDKit Discuss
Subject: Re: [Rdkit-discuss] Partial substructure match?

Gustavo -
That sounds like the "maximum common substructure" problem. Here's the relevant 
section in RDKit's  "Getting started in Python"

https://www.rdkit.org/docs/GettingStartedInPython.html#maximum-common-substructure



dan nealschneider | lead developer


[Schrodinger Logo]<https://www.schrodinger.com/>


On Thu, Nov 19, 2020 at 8:50 AM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Hi all,

Is it possible to search for *partial* substructure matches using RDKit?

I'm aware of "HasSubstructMatch/ GetSubstructMatch", but my impression is
that it only returns full matches (100%) of the required pattern in a
structure.

However, what I'd like to do is a bit different: Imagine I have one specific
substructure (scaffold), and I'd like to search for molecules that have the
full substructure *or part of it*, and maybe get the percentage of the
substructure match? (100% = the full substructure is contained in the
molecule). For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.

Is there a way to do that in RDKit?

Thanks a lot!
--
Gustavo Seabra




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Rajarshi Guha | http://blog.rguha.net | @rguha<https://twitter.com/rguha>

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Partial substructure match?

2020-11-20 Thread Adelene LAI
Hi Dan and Gustavo,


MCSS sounds good, but depends on the goal.


>From the way Gustavo wrote, it sounds like a Query-Target substructure search 
>- he has a list of targets and one specific query, and he wants to compare 
>matching rate amongst the members of the list.


If so, I would try query SMARTS.

https://www.rdkit.org/docs/GettingStartedInPython.html#substructure-searching


Regarding the % substructure match, interesting question. How would you 
quantify that? Not sure such a thing exists in RDKit right now.


Adelene

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai











From: Dan Nealschneider 
Sent: Thursday, November 19, 2020 6:01:37 PM
To: Gustavo Seabra
Cc: RDKit Discuss
Subject: Re: [Rdkit-discuss] Partial substructure match?

Gustavo -
That sounds like the "maximum common substructure" problem. Here's the relevant 
section in RDKit's  "Getting started in Python"

https://www.rdkit.org/docs/GettingStartedInPython.html#maximum-common-substructure



dan nealschneider | lead developer


[Schrodinger Logo]


On Thu, Nov 19, 2020 at 8:50 AM Gustavo Seabra 
mailto:gustavo.sea...@gmail.com>> wrote:
Hi all,

Is it possible to search for *partial* substructure matches using RDKit?

I'm aware of "HasSubstructMatch/ GetSubstructMatch", but my impression is
that it only returns full matches (100%) of the required pattern in a
structure.

However, what I'd like to do is a bit different: Imagine I have one specific
substructure (scaffold), and I'd like to search for molecules that have the
full substructure *or part of it*, and maybe get the percentage of the
substructure match? (100% = the full substructure is contained in the
molecule). For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.

Is there a way to do that in RDKit?

Thanks a lot!
--
Gustavo Seabra




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] A postdoc position at the intersection of cheminformatics and life cycle assessment

2020-11-12 Thread Adelene LAI
Hi RDKit Community,


Please see the below post-doc opening and feel free to share.


Regards,

Adelene

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai












From: Wang Zhanyun (IfU, ESD) 
Sent: Thursday, November 12, 2020 9:58 AM
To: Adelene LAI
Subject: A postdoc position at the intersection of cheminformatics and life 
cycle assessment

Dear Adelene,

Hope this email finds you well and all is well despite this difficult time. I 
am writing to you with regard to a postdoc opening in our group: 
https://jobs.ethz.ch/job/view/JOPG_ethz_zDBtS2UDFZ3HTDz5Gd Could you please 
forward this email to whoever may be interested?
[https://jobs.ethz.ch/assets/images/og-image.png]<https://jobs.ethz.ch/job/view/JOPG_ethz_zDBtS2UDFZ3HTDz5Gd>

Ph.D. / Postdoc Position on Predictive Life Cycle Assessment for New 
Chemicals<https://jobs.ethz.ch/job/view/JOPG_ethz_zDBtS2UDFZ3HTDz5Gd>
jobs.ethz.ch



The project intends to explore cheminformatics approaches in life cycle 
assessment. More specifically, it aims to develop new machine/deep 
learning-based tools that can predict chemicals’ environmental impacts (climate 
change potential, energy demand, etc.) from their molecular structures so as to 
enable chemists to design molecules with as low environmental impacts as 
possible. The project will be built on an existing tool, Finechem based on 10 
molecular descriptors and neural network technique; the Finechem tool has quite 
some limitations that we would hope to overcome in this project.

The postdoc will be three years at the Chair of Ecological Systems Design 
(https://esd.ifu.ethz.ch) at ETH Zurich, Switzerland, and is a part of the 
Swiss National Centres of Competence in Research on Sustainable Chemical 
Processes through Catalysis (https://www.nccr-catalysis.ch).
[https://ethz.ch/etc/designs/ethz/img/header/eth_default_og.jpg]<https://esd.ifu.ethz.ch/>

Homepage - Ecological Systems Design – Chair of Ecological Systems Design | ETH 
Zurich<https://esd.ifu.ethz.ch/>
esd.ifu.ethz.ch



Many thanks, and have a good day! Stay healthy and well!

Best regards,
Zhanyun

--

Zhanyun Wang, Dr.
Oberassistent / Lecturer

ETH Zürich
Institute of Environmental Engineering (IfU)
Ecological Systems Design
HPZ E 32.1
John-von-Neumann-Weg 9
CH-8093 Zürich
Switzerland

phone: 0041-44-6337066
zhanyun.w...@chem.ethz.ch
zhanyun.w...@ifu.baug.ethz.ch
google scholar: http://goo.gl/Ektq6S
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-30 Thread Adelene LAI
On the bright side, I won't lose time 
generating InChIs...

Can I trust that the same molecule will always get the same canonical SMILES 
from RDKit, independent of how it is read? (Different SDF files, geometries, 
atom orders, etc.?)

All the best,
Gustavo.


--
Gustavo Seabra.


On Sun, Oct 25, 2020 at 8:27 PM Peter S. Shenkin 
mailto:shen...@gmail.com>> wrote:
Canonical SMILES is probably the way to go, but you might also be able to use 
the InchiKey and the Inchi auxiliary information together as a compound hash 
key.

-P.

On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Gustavo,


(Sorry, forgot to reply all before...)


Your deduplication task is quite familiar to me and something I do quite a lot 
of in my own work ;)


Can I suggest deduplicating using Canonical SMILES?


It doesn't solve your InChIKey issue, but it is a solution for now.


I updated my gist to show that it is feasible:


https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>

Adelene



Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
 L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai





____
From: Gustavo Seabra mailto:gustavo.sea...@gmail.com>>
Sent: Sunday, October 25, 2020 2:27:15 PM
To: Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Actually,  I was trying to generate all stereoisomers for molecules in a 
database,  and filter duplicate molecules by using the InChI Key to detect 
duplicates.  But it gives cis/trans isomers on sp2-N the same Key.

Gustavo.

--
Gustavo Seabra


From: Adelene LAI mailto:adelene@uni.lu>>
Sent: Sunday, October 25, 2020 1:44:01 AM
To: Gustavo Seabra mailto:gustavo.sea...@gmail.com>>
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


It occurred to me while swimming yesterday - was there a reason you pointed out 
the hybridisation state of N in your original subject text?


Was it just to specify which N to focus on, or did you expect something special 
about sp2 hybridisation wrt InChIKey?


Adelene


Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
 L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra mailto:gustavo.sea...@gmail.com>>
Sent: Saturday, October 24, 2020 5:37:09 AM
To: RDKit Discuss; Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Thanks for looking into it. I'm happy to see.it<http://see.it> wasn't just a 
mistake by me ;-)

I hope we can find what's wrong there.

Best,
Gustavo.

--
Gustavo Seabra


From: Adelene LAI mailto:adelene@uni.lu>>
Sent: Friday, October 23, 2020 11:28:55 PM
To: Gustavo Seabra mailto:gustavo.sea...@gmail.com>>; 
RDKit Discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene






Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
 L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra mailto:gustavo.sea...@gmail.com>>
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-25 Thread Adelene LAI
Hi Gustavo,


(Sorry, forgot to reply all before...)


Your deduplication task is quite familiar to me and something I do quite a lot 
of in my own work ;)


Can I suggest deduplicating using Canonical SMILES?


It doesn't solve your InChIKey issue, but it is a solution for now.


I updated my gist to show that it is feasible:


https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>

Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Gustavo Seabra 
Sent: Sunday, October 25, 2020 2:27:15 PM
To: Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Actually,  I was trying to generate all stereoisomers for molecules in a 
database,  and filter duplicate molecules by using the InChI Key to detect 
duplicates.  But it gives cis/trans isomers on sp2-N the same Key.

Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Sunday, October 25, 2020 1:44:01 AM
To: Gustavo Seabra 
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


It occurred to me while swimming yesterday - was there a reason you pointed out 
the hybridisation state of N in your original subject text?


Was it just to specify which N to focus on, or did you expect something special 
about sp2 hybridisation wrt InChIKey?


Adelene


Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra 
Sent: Saturday, October 24, 2020 5:37:09 AM
To: RDKit Discuss; Adelene LAI
Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Thanks for looking into it. I'm happy to see.it wasn't just a mistake by me ;-)

I hope we can find what's wrong there.

Best,
Gustavo.

--
Gustavo Seabra


From: Adelene LAI 
Sent: Friday, October 23, 2020 11:28:55 PM
To: Gustavo Seabra ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key


Hi Gustavo,


<https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f>https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene






Doctoral Researcher

Environmental Cheminformatics

UNIVERSITÉ DU LUXEMBOURG


LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE

6, avenue du Swing, L-4367 Belvaux

T +356 46 66 44 67 18

[github.png] adelenelai






From: Gustavo Seabra 
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-23 Thread Adelene LAI
Hi Gustavo,


https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f


In the gist above, I tried doing some further investigating.


It seems for the example you gave, the rdkit functions indeed give the same 
inchikey and inchi, but different aux info.


Why this different aux info doesn't translate into different inchikeys/inchis, 
I'm not sure.


Adelene





Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Gustavo Seabra 
Sent: Friday, October 23, 2020 6:43:07 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

Hi all,

I run into an issue here, and I'd appreciate your input. I noticed that 
compounds that differ only on the cis-trans isomerization around an sp2 
nitrogen get the same InChI Key from RDKit. For example:

> inchi_cis = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> inchi_cis
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_trans = 
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> inchi_trans
'AQIXAKUUQRKLND-UHFFFAOYSA-N'

> inchi_cis == inchi_trans
True

I wonder if this is a limitation of the InChI Key definition, or an 
implementation issue.

Thanks a lot,
--
Gustavo Seabra.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-23 Thread Adelene LAI
Hi Dave,


Understood, but I actually meant distinguishing between the mol objects of the 
unspecified vs. unknown stereochem forms, not their SMILES.


Since Paolo is proposing the option for both unspecified and unknown to be 
depicted as crossed bonds (and since both forms would have the same underlying 
SMILES), the only way the user could distinguish them would be to check 
bond.GetStereo.


Adelene

Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: David Cosgrove 
Sent: Thursday, October 22, 2020 12:46:54 PM
To: Adelene LAI
Cc: Greg Landrum; Paolo Tosco; rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene,
In SMILES, there’s no way of distinguishing between unknown and unspecified. 
Technically in a SMILES string it’s either specified or unspecified. In an SDF 
you can also say you have a Rumsfeldian “known unknown”.

Dave

On Thu, 22 Oct 2020 at 10:07, Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Dear Paolo,



Thanks for updating the gist - it's a really important resource for me and 
probably future RDKit beginners too. Thanks.


I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag to 
SmilesParserParams. I think this way  circumvents having to do a SS-match + 
BondStereo replacement loop.


To clarify, will implementing the above effectively mean unspecified stereo 
will be depicted as a crossed double bond too?


Because then, the only way to differentiate between stereo unspecified and 
stereo unknown would be to run bond.GetStereo(), which would give STEREOANY or 
STEREONONE respectively. I think this would be OK...unless depiction-folks have 
alternative suggestions.


Adelene














Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
 L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>>
Sent: Wednesday, October 21, 2020 10:56:24 AM
To: Adelene LAI
Cc: Greg Landrum; rdkit-discuss

Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene, Greg,

I have updated my gist fixing my gross vocabulary mistake ("undefined" to 
"unspecified") and I have also added an example of the crossed bond depiction 
by changing the BondStereo attribute to STEREOANY.

@Adelene: I think you touched an interesting point here. There are indeed cases 
where it would be nice to address the SMILES ambiguity (no way to symbolically 
discriminate "unspecified" from "unknown") more efficiently than by doing a 
time-consuming (and potentially error-prone) substructure match and BondStereo 
replacement on all input molecules, particularly if you have a large number of 
those.

I propose to do that by adding a unspecifiedBondStereoMeansUnknown (suggestion 
on a better name welcome) flag to SmilesParserParams - I believe that would be 
useful to many.

Cheers,
p.

On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Greg, Hi Paolo,


@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du 
Swing<https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
 L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Greg Landrum mailto:greg.land...@gmail.com>>
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the 
stereochemistry of th

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-22 Thread Adelene LAI
Dear Paolo,



Thanks for updating the gist - it's a really important resource for me and 
probably future RDKit beginners too. Thanks.


I like your suggestion to add the unspecifiedBondStereoMeansUnknown flag to 
SmilesParserParams. I think this way  circumvents having to do a SS-match + 
BondStereo replacement loop.


To clarify, will implementing the above effectively mean unspecified stereo 
will be depicted as a crossed double bond too?


Because then, the only way to differentiate between stereo unspecified and 
stereo unknown would be to run bond.GetStereo(), which would give STEREOANY or 
STEREONONE respectively. I think this would be OK...unless depiction-folks have 
alternative suggestions.


Adelene














Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Paolo Tosco 
Sent: Wednesday, October 21, 2020 10:56:24 AM
To: Adelene LAI
Cc: Greg Landrum; rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene, Greg,

I have updated my gist fixing my gross vocabulary mistake ("undefined" to 
"unspecified") and I have also added an example of the crossed bond depiction 
by changing the BondStereo attribute to STEREOANY.

@Adelene: I think you touched an interesting point here. There are indeed cases 
where it would be nice to address the SMILES ambiguity (no way to symbolically 
discriminate "unspecified" from "unknown") more efficiently than by doing a 
time-consuming (and potentially error-prone) substructure match and BondStereo 
replacement on all input molecules, particularly if you have a large number of 
those.

I propose to do that by adding a unspecifiedBondStereoMeansUnknown (suggestion 
on a better name welcome) flag to SmilesParserParams - I believe that would be 
useful to many.

Cheers,
p.

On Wed, Oct 21, 2020 at 8:00 AM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi Greg, Hi Paolo,


@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Greg Landrum mailto:greg.land...@gmail.com>>
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the 
stereochemistry of the double bond is unknown.

The problem here is that in standard SMILES there is no way to actively specify 
that you don't know the stereochemistry of a double bond (the same thing 
applies to stereocenters). You can either provide information about the 
stereochemistry by using "/" and "\" bonds, or you provide no information. So 
the SMILES C/C=C/C produces a double bond with known stereochemistry but CC=CC 
produces a double bond with unspecified stereochemistry.

If, based on what you know about the SMILES that you are parsing, you would 
like to change the convention and have unspecified double bonds be marked as 
unknown, it's straightforward to write a script that loops over the molecule 
and makes that change (watch out for ring bonds).

-greg
[1] Perhaps "mistake" isn't the right word. It's confusing

On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:
Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
ster

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-21 Thread Adelene LAI
Hi Greg, Hi Paolo,


@Paolo - thanks for the updated gist!


@Greg - thanks for this detailed explanation. I think it makes sense to equate 
unspecified with unknown stereochem. I can't think of any obvious caveats to 
this convention change for now (but maybe others in the community can?).


When you say "have unspecified double bonds be marked as unknown", you mean 
have unspecified double bonds be represented by crossed bonds too?


If so, would this loop you're suggesting be computationally not-too-expensive 
when working with 1000s of molecules?




Thanks and good morning!


Adelene


Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Greg Landrum 
Sent: Wednesday, October 21, 2020 6:15:58 AM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Paolo's gist includes a vocabulary mistake[1] that I think is confusing things 
here.

In the RDKit the stereochemistry of a double bond can be unspecified, unknown, 
or known. Unspecified means that you haven't said anything about what the 
stereo is; unknown means that you've actively provided the information that you 
don't know what the stereochemistry is; known is clear.

The RDKit only draws crossed bonds in molecule drawings when the 
stereochemistry of the double bond is unknown.

The problem here is that in standard SMILES there is no way to actively specify 
that you don't know the stereochemistry of a double bond (the same thing 
applies to stereocenters). You can either provide information about the 
stereochemistry by using "/" and "\" bonds, or you provide no information. So 
the SMILES C/C=C/C produces a double bond with known stereochemistry but CC=CC 
produces a double bond with unspecified stereochemistry.

If, based on what you know about the SMILES that you are parsing, you would 
like to change the convention and have unspecified double bonds be marked as 
unknown, it's straightforward to write a script that loops over the molecule 
and makes that change (watch out for ring bonds).

-greg
[1] Perhaps "mistake" isn't the right word. It's confusing

On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:
Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://owa.uni.lu/owa/]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>


or that single bond to be squiggly, as in CDK's depiction:

[https://owa.uni.lu/owa/]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570


<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>

<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Adelene LAI
Hi Dave and Pablo,


Thanks for your helpful replies.


@Dave, issue created: https://github.com/rdkit/rdkit/issues/3514


@Pablo, your gist shows that the internal representation of the mol does indeed 
factor in undefined stereo, contrary to the way it is depicted.


But why then does this happen when I check if the 2 molecules are the same?


smi = Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
isosmi = 
Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
print(smi == isosmi)#True, expect False
print(smi.HasSubstructMatch(isosmi)) #True, expect False
print(isosmi.HasSubstructMatch(smi))   #True, expect False
print(smi.HasSubstructMatch(isosmi) and isosmi.HasSubstructMatch(smi))   #True, 
expect False


However, converting smi and isosmi to canonical smiles and comparing them gives 
False, as expected:

a = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
b = Chem.CanonSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O')
a == b   #False


(If there are better ways to check if 2 molecules are equal, I'd be interested 
to know.)
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/9DF05ED7-A30E-4742-A568-9B3995689382%40dalkescientific.com/#msg29882815
 ?


Adelene





Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai






From: Paolo Tosco 
Sent: Tuesday, October 20, 2020 1:52:12 PM
To: Adelene LAI
Cc: rdkit-discuss
Subject: Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

Hi Adelene,

this gist

https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b

shows how to add stereo annotations to RDKit 2D depictions, and also how to 
access the double bond stereochemistry programmatically.

Cheers,
p.


On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI 
mailto:adelene@uni.lu>> wrote:

Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/service.svc/s/GetFileAttachment?id=AAMkAGZmYjQwYmQ2LTcxODYtNDNhYi1hNTZlLTFiNDgxODA0MjNiZQBGAADhez7GVLyNT6vooKL2ihHhBwBuSX%2BNSPCHQainUEFyygsfAAAB%2B4B1AABuSX%2BNSPCHQainUEFyygsfAAGQzO9iAAABEgAQACo4Qhn9gSVGjyknvlrNy9g%3D=KzXvJGD5S0GSEPfNkS5fZYDFe7bcdNgIObv5ckhjF4wefmj-g3q1TT_E6gcW1r5xr5EjBUEwMBo.=True]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://lh6.googleusercontent.com/qcj3x-KsughszG8tryquO6V-VDfqWT0oNF-LfA0jHbbue2pSzA69HqOAWsa_34FYyxQKfTdJv6gWeIsXW-hhNglMy4_rpf6l_x-Y3ufGRpuz_c1ZCK69k4VKVmE1Cq93rhdD7a7ij8U]<https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>


or that single bond to be squiggly, as in CDK's depiction:

[https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC(C)(C1%3DCC(%3DC(C(%3DC1)Br)O)Br)C(%3DCC(C(%3DO)O)Br)CC(%3DO)O=80=50=on=bridgehead=false=1.6=none]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570


<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>

<https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai





___
Rdkit-discuss mailing list
Rdkit-discuss@l

[Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-20 Thread Adelene LAI
Hi RDKit Community,


Is there a way to preserve undefined stereochemistry aka unspecified 
stereochemistry when doing MolFromSmiles?


I'm working with a bunch of molecules, some with stereochemistry defined, some 
without.


If stereochemistry is undefined in the SMILES, I would like it to stay that way 
when converted to a Mol, but this doesn't seem to be the case:


> mol = 
> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
> mol

[https://owa.uni.lu/owa/service.svc/s/GetFileAttachment?id=AAMkAGZmYjQwYmQ2LTcxODYtNDNhYi1hNTZlLTFiNDgxODA0MjNiZQBGAADhez7GVLyNT6vooKL2ihHhBwBuSX%2BNSPCHQainUEFyygsfAAAB%2B4B1AABuSX%2BNSPCHQainUEFyygsfAAGQzO9iAAABEgAQACo4Qhn9gSVGjyknvlrNy9g%3D=KzXvJGD5S0GSEPfNkS5fZYDFe7bcdNgIObv5ckhjF4wefmj-g3q1TT_E6gcW1r5xr5EjBUEwMBo.=True]

One would expect that C=C to either be crossed, as in PubChem's depiction:

https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure

[https://lh6.googleusercontent.com/qcj3x-KsughszG8tryquO6V-VDfqWT0oNF-LfA0jHbbue2pSzA69HqOAWsa_34FYyxQKfTdJv6gWeIsXW-hhNglMy4_rpf6l_x-Y3ufGRpuz_c1ZCK69k4VKVmE1Cq93rhdD7a7ij8U]


or that single bond to be squiggly, as in CDK's depiction:

[https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC(C)(C1%3DCC(%3DC(C(%3DC1)Br)O)Br)C(%3DCC(C(%3DO)O)Br)CC(%3DO)O=80=50=on=bridgehead=false=1.6=none]

But it's not just a matter of depiction, as it seems internally, mol is 
equivalent to its stereochem-specific sibling (Entgegen form)


CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O



I've tried sanitize=False, but it doesn't seem to have any effect. I would 
prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY) for every 
molecule with undefined stereochem (not sure how I would even go about that...).


Possibly related to:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570





https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
o = Chem.MolFromSmiles('C/C=C/C')

https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html

https://github.com/openforcefield/openforcefield/issues/146




Any help would be much appreciated.


Thanks,

Adelene







Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
[github.png] adelenelai




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Advanced Substructure Matching question

2020-04-30 Thread Adelene LAI SHUEN LYN
Hello,


I'm trying to understand the example below.


https://www.rdkit.org/docs/GettingStartedInPython.html#advanced-substructure-matching


Why is it when checking for "all_carbon" that the substructure with indices 
6,11,17,5,4 is filtered out?


Isn't atom 4 also a carbon?

Or does it have something to do with it having the all_carbon query set as one 
of its properties?



Thanks,

Adelene




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss