Thank you WIm for your clarification: so the library is not inferring the
valence, it's "choosing" one.

I think I've found the solution to my issue: I should probably use
rdkit.MolFromSmarts()

mol = chem.rdkit.MolFromSmarts('CCS=O')
chem.rdkit.MolToSmiles(mol)
'CCS=O'

Sometimes, just explaining your problem to others, helps finding the
solution.
Thomas

Il giorno sab 29 apr 2023 alle ore 20:45 Wim Dehaen <wimdeh...@gmail.com>
ha scritto:

> THe reason for this is that it will prevent ambiguities due to
> nonstandard, higher valences. Because of this, it is not possible to infer
> the implicit hydrogen count, so it must be specified explicitly. For S and
> P the standard valence would be 2 and 3 respectively, just like for O and
> N. But S has nonstandard valences available: 4 and 6 as in sulfones and
> sulfoxides. P can commonly have valence of 5, as in phosphoranes.
> Your provided SMILES has a valence of at least 3, exceeding the standard
> valence of 2. This creates and ambiguity, where the SMILES parser has to
> decide whether the S has a valence of 4 or 6. Likewise, with the SMILES
> "FP(F)(F)F" a roundtrip through rdkit will convert this into
> "F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and
> distinguishable from FP(F)F. In general when higher valence states are not
> possible rdkit will throw a valence error but there are some more examples
> available. For example "CIC" will become C[IH]C.
>
> best wishes
> wim
>
>
> On Sat, Apr 29, 2023 at 12:20 PM Thomas <odioidenti...@gmail.com> wrote:
>
>> I am not a chemist, so it can be a silly question, but I am interested in
>> the logic behind it, also because other libraries (like OpenBabel) behave
>> differently.
>>
>> Why sometimes RDKit writes hydrogens explicitly?
>>
>> mol = rdkit.MolFromSmiles('CCS=O', sanitize=False)
>> rdkit.MolToSmiles(mol)
>> 'CC[SH]=O'
>>
>> The input SMILES is intended as a pattern, not a molecule. I make a mol
>> out of it only to get the canonical SMILES, that will be then used as
>> SMARTS.
>> Logically, I don't understand how the number of H attached to the S can
>> be "guessed" by the library, still it cannot be left implicit.
>>
>> Furthermore, I have seen this behaviour only with S and P. I was
>> wondering if it's a confined issue, or it can happen with any element.
>> Thank you
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to