[Rdkit-discuss] ETKDG improvement for small and large rings

2020-05-11 Thread Sereina Riniker
Dear RDKit Users, 

For your information (and to make a bit of advertisement): 
We have recently developed and published an extension of the ETKDG conformer 
generator to improve sampling of small and large rings, which is available in 
the 2020.03 release of the RDKit.

Shuzhe Wang, Jagna Witek, Greg Landrum, Sereina Riniker, J. Chem. Inf. Model., 
60, 2044 (2020)
"Im­prov­ing Con­former Gen­er­a­tion for Small Rings and Mac­ro­cycles Based 
on Dis­tance Geo­metry and Ex­per­i­mental Torsional-​Angle Pref­er­ences”
https://pubs.acs.org/doi/10.1021/acs.jcim.0c00025

If you want to try it out, Shuzhe has added a section in the RDKit cookbook to 
showcase the new functionalities:
https://github.com/rdkit/rdkit/blob/master/Docs/Book/Cookbook.rst#conformer-generation-with-etkdg

We hope that you find it useful and we’re happy for any feedback!

Best regards,
Sereina


 - - - 

Prof. Dr. Sereina Riniker
ETH Zürich
Laboratory of Physical Chemistry
HCI G225
Vladimir-Prelog-Weg 2
8093 Zürich
+41 44 633 42 39
srini...@ethz.ch
www.riniker.ethz.ch

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Paolo Tosco

Dear Pablo,

You might do something along these lines:

from  rdkit  import  Chem

smi  =  "[H]C([H])O"

params  =  Chem.SmilesParserParams()

params.sanitize  =  True
params.removeHs  =  False
mol  =  Chem.MolFromSmiles(smi,  params)
 


for  a  in  mol.GetAtoms():
if  a.GetNumImplicitHs():
print("Wrong valence for atom {0:s}{1:d}".format(a.GetSymbol(),  
a.GetIdx()))

Wrong valence for atom C 1
Wrong valence for atom O 3
 


HTH, cheers
p.

On 11/05/2020 13:33, Pablo Ramos wrote:


Dear all,

I am trying to catch an error every time that a smiles associated to a 
mol object does not exist. To do this, I want to use sanitize 
function: if the smiles is incorrect I will get my error.


My smiles with *explicit hydrogens* is the next one: [H]C([H])O

I want it to provide an error since valences do not match the ones 
specified for Carbon and Oxygen beingHydrogens already explicit: C_val 
= 4 ; O_val = 2


However, sanitizing this object creates automatically the missing 
Hydrogens providing a valid smiles: [H]OC([H])([H])[H] and therefore 
it assumes the smiles is correct.


Is there any way to specify that my Hydrogens are already explicit 
during sanitazion so I get my error message?


Thank you so much J

Best regards,

*Pablo Ramos*

Ph.D. at Covestro Deutschland AG



covestro.com 

*Telephone*

+49 214 6009 7356



Covestro Deutschland AG

COVDEAG-Chief Commer-PUR-R

B103, R164

51365 Leverkusen, Germany

_pablo.ramos@covestro.com_



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Ivan Tubert-Brohman
Hi Pablo,

SMILES by definition has implicit hydrogens (enough to satisfy the typicial
valence) for atoms that are not within brackets.

It doesn't matter if you write C, C[H], [H]C[H], or [H]C([H])([H])[H]; they
are all methane. The number of hydrogens that are returned by
GetNumImplicitHs() and GetNumExplicitHs() on the carbon atom do vary,
though. But if you convert back to SMILES, you get "C" in all cases (with
the default options).

If you want an atom with no implicit hydrogens, you have to put it in
brackets. [C] is a lone carbon atom. [CH3] or [H][C]([H])[H] is a methyl
radical.

But you still won't get sanitization errors for such species, because
radicals are legitimate species after all, and I imagine RDKit doesn't want
to guess whether you like radicals or not. So I suspect you'll have to
write your own "sanity" function which uses whatever definition of sanity
you need. For example, you could loop over the atoms and check the value of
GetTotalValence() and compare it with the expected valence for that
element. Or perhaps you could make use of GetNumRadicalElectrons().

Hope this helps,
Ivan

On Mon, May 11, 2020 at 9:37 AM Pablo Ramos 
wrote:

> Dear all,
>
>
>
> I am trying to catch an error every time that a smiles associated to a mol
> object does not exist. To do this, I want to use sanitize function: if the
> smiles is incorrect I will get my error.
>
> My smiles with *explicit hydrogens* is the next one: [H]C([H])O
>
>
>
> I want it to provide an error since valences do not match the ones
> specified for Carbon and Oxygen beingHydrogens already explicit: C_val = 4
> ; O_val = 2
>
>
>
> However, sanitizing this object creates automatically the missing Hydrogens 
> providing a valid smiles: [H]OC([H])([H])[H] and therefore it assumes the 
> smiles is correct.
>
>
>
> Is there any way to specify that my Hydrogens are already explicit during
> sanitazion so I get my error message?
>
>
>
> Thank you so much J
>
>
>
>
>
> Best regards,
>
>
>
> *Pablo Ramos*
>
> Ph.D. at Covestro Deutschland AG
>
>
>
>
>
> covestro.com 
>
> *Telephone*
>
> +49 214 6009 7356
>
>
>
> Covestro Deutschland AG
>
> COVDEAG-Chief Commer-PUR-R
>
> B103, R164
>
> 51365 Leverkusen, Germany
>
> *pablo.ra...@covestro.com *
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Pablo Ramos
Dear all,

I am trying to catch an error every time that a smiles associated to a mol 
object does not exist. To do this, I want to use sanitize function: if the 
smiles is incorrect I will get my error.
My smiles with explicit hydrogens is the next one: [H]C([H])O

I want it to provide an error since valences do not match the ones specified 
for Carbon and Oxygen beingHydrogens already explicit: C_val = 4 ; O_val = 2


However, sanitizing this object creates automatically the missing Hydrogens 
providing a valid smiles: [H]OC([H])([H])[H] and therefore it assumes the 
smiles is correct.


Is there any way to specify that my Hydrogens are already explicit during 
sanitazion so I get my error message?

Thank you so much :)


Best regards,

Pablo Ramos
Ph.D. at Covestro Deutschland AG

[cid:image003.png@01D627A1.2B0B0A10]

covestro.com
Telephone
+49 214 6009 7356

Covestro Deutschland AG
COVDEAG-Chief Commer-PUR-R
B103, R164
51365 Leverkusen, Germany
pablo.ra...@covestro.com


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatches and unique match

2020-05-11 Thread Greenpharma S.A.S.
Dear Paolo,
Thank you very much. I'll test this and revert to you.
Have a nice day.
Best regards,
Quoc-Tuan

> Le 10 mai 2020 à 13:09, Paolo Tosco  mailto:paolo.tosco.m...@gmail.com > a écrit :
> 
> 
> Dear Quoc-Tuan,
> 
> I think I have come with a reasonably fast algorithm that seems to be
> more robust:
> 
> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> 
> Cheers,
> p.
> 
> On 06/05/2020 09:11, Quoc-Tuan DO wrote:
> 
> > > Dear Paolo,
> > 
> > > 
> > > Thank you again for your code. Sorry for bothering you again. It 
> works
> > all fine for monoterpenes but not for diterpenes, sesquiterpenes nor
> > triterpenes.
> > 
> > > 
> > > pattern: C~C~C(~C)~C
> > 
> > > 
> > > mol1: 
> CC(=O)O[C@H]1CC[C@]2([C@H](C1(C)C)CC=C([C@@H]2CC/C(=C/C(=O)O)/C)C)C
> > 
> > > 
> > > => ((17, 18, 19, 20, 23), (16, 24, 13, 14, 15), (8, 9, 4, 12, 7))
> > 
> > > 
> > > It should find 4 distinct units.
> > 
> > > 
> > > mol2: 
> OCC12CCC(C2C2C(CC1)(C)C1(C)CCC3C(C1CC2)(C)CCC(C3(C)C)O)C(=C)C
> > 
> > > 
> > > => ((16, 25, 27, 17, 15), (18, 19, 12, 13, 14), (1, 2, 5, 6, 7))
> > 
> > > 
> > > It should find 6 distinct units.
> > 
> > > 
> > > I tried with a smarts version of the pattern
> > [#6]~[#6]~[#6](~[#6])~[#6], but got the same results as with smiles.
> > 
> > > 
> > > What do you think? Is there something missing in the query?
> > 
> > > 
> > > Thanks for your time,
> > 
> > > 
> > > Best regards,
> > 
> > > 
> > > QT
> > 
> > > >
> >
> 
> > > Le 05/05/2020 à 14:52, Paolo Tosco a écrit :
> > >
> > 
> > > >> Dear Quoc-Tuan,
> >>
> >> this should do what you need:
> >>
> >> https://gist.github.com/ptosco/dc4d27153e6e8e45aed654761e4d7409
> >>
> >> Cheers,
> >> p.
> >>
> >
> 
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss