Re: [Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Paolo Tosco

Dear Pablo,

You might do something along these lines:

from  rdkit  import  Chem

smi  =  "[H]C([H])O"

params  =  Chem.SmilesParserParams()

params.sanitize  =  True
params.removeHs  =  False
mol  =  Chem.MolFromSmiles(smi,  params)
 


for  a  in  mol.GetAtoms():
if  a.GetNumImplicitHs():
print("Wrong valence for atom {0:s}{1:d}".format(a.GetSymbol(),  
a.GetIdx()))

Wrong valence for atom C 1
Wrong valence for atom O 3
 


HTH, cheers
p.

On 11/05/2020 13:33, Pablo Ramos wrote:


Dear all,

I am trying to catch an error every time that a smiles associated to a 
mol object does not exist. To do this, I want to use sanitize 
function: if the smiles is incorrect I will get my error.


My smiles with *explicit hydrogens* is the next one: [H]C([H])O

I want it to provide an error since valences do not match the ones 
specified for Carbon and Oxygen beingHydrogens already explicit: C_val 
= 4 ; O_val = 2


However, sanitizing this object creates automatically the missing 
Hydrogens providing a valid smiles: [H]OC([H])([H])[H] and therefore 
it assumes the smiles is correct.


Is there any way to specify that my Hydrogens are already explicit 
during sanitazion so I get my error message?


Thank you so much J

Best regards,

*Pablo Ramos*

Ph.D. at Covestro Deutschland AG



covestro.com 

*Telephone*

+49 214 6009 7356



Covestro Deutschland AG

COVDEAG-Chief Commer-PUR-R

B103, R164

51365 Leverkusen, Germany

_pablo.ramos@covestro.com_



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Ivan Tubert-Brohman
Hi Pablo,

SMILES by definition has implicit hydrogens (enough to satisfy the typicial
valence) for atoms that are not within brackets.

It doesn't matter if you write C, C[H], [H]C[H], or [H]C([H])([H])[H]; they
are all methane. The number of hydrogens that are returned by
GetNumImplicitHs() and GetNumExplicitHs() on the carbon atom do vary,
though. But if you convert back to SMILES, you get "C" in all cases (with
the default options).

If you want an atom with no implicit hydrogens, you have to put it in
brackets. [C] is a lone carbon atom. [CH3] or [H][C]([H])[H] is a methyl
radical.

But you still won't get sanitization errors for such species, because
radicals are legitimate species after all, and I imagine RDKit doesn't want
to guess whether you like radicals or not. So I suspect you'll have to
write your own "sanity" function which uses whatever definition of sanity
you need. For example, you could loop over the atoms and check the value of
GetTotalValence() and compare it with the expected valence for that
element. Or perhaps you could make use of GetNumRadicalElectrons().

Hope this helps,
Ivan

On Mon, May 11, 2020 at 9:37 AM Pablo Ramos 
wrote:

> Dear all,
>
>
>
> I am trying to catch an error every time that a smiles associated to a mol
> object does not exist. To do this, I want to use sanitize function: if the
> smiles is incorrect I will get my error.
>
> My smiles with *explicit hydrogens* is the next one: [H]C([H])O
>
>
>
> I want it to provide an error since valences do not match the ones
> specified for Carbon and Oxygen beingHydrogens already explicit: C_val = 4
> ; O_val = 2
>
>
>
> However, sanitizing this object creates automatically the missing Hydrogens 
> providing a valid smiles: [H]OC([H])([H])[H] and therefore it assumes the 
> smiles is correct.
>
>
>
> Is there any way to specify that my Hydrogens are already explicit during
> sanitazion so I get my error message?
>
>
>
> Thank you so much J
>
>
>
>
>
> Best regards,
>
>
>
> *Pablo Ramos*
>
> Ph.D. at Covestro Deutschland AG
>
>
>
>
>
> covestro.com 
>
> *Telephone*
>
> +49 214 6009 7356
>
>
>
> Covestro Deutschland AG
>
> COVDEAG-Chief Commer-PUR-R
>
> B103, R164
>
> 51365 Leverkusen, Germany
>
> *pablo.ra...@covestro.com *
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Pablo Ramos
Dear all,

I am trying to catch an error every time that a smiles associated to a mol 
object does not exist. To do this, I want to use sanitize function: if the 
smiles is incorrect I will get my error.
My smiles with explicit hydrogens is the next one: [H]C([H])O

I want it to provide an error since valences do not match the ones specified 
for Carbon and Oxygen beingHydrogens already explicit: C_val = 4 ; O_val = 2


However, sanitizing this object creates automatically the missing Hydrogens 
providing a valid smiles: [H]OC([H])([H])[H] and therefore it assumes the 
smiles is correct.


Is there any way to specify that my Hydrogens are already explicit during 
sanitazion so I get my error message?

Thank you so much :)


Best regards,

Pablo Ramos
Ph.D. at Covestro Deutschland AG

[cid:image003.png@01D627A1.2B0B0A10]

covestro.com
Telephone
+49 214 6009 7356

Covestro Deutschland AG
COVDEAG-Chief Commer-PUR-R
B103, R164
51365 Leverkusen, Germany
pablo.ra...@covestro.com


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss