Thank you WIm for your clarification: so the library is not inferring the valence, it's "choosing" one.
I think I've found the solution to my issue: I should probably use rdkit.MolFromSmarts() mol = chem.rdkit.MolFromSmarts('CCS=O') chem.rdkit.MolToSmiles(mol) 'CCS=O' Sometimes, just explaining your problem to others, helps finding the solution. Thomas Il giorno sab 29 apr 2023 alle ore 20:45 Wim Dehaen <wimdeh...@gmail.com> ha scritto: > THe reason for this is that it will prevent ambiguities due to > nonstandard, higher valences. Because of this, it is not possible to infer > the implicit hydrogen count, so it must be specified explicitly. For S and > P the standard valence would be 2 and 3 respectively, just like for O and > N. But S has nonstandard valences available: 4 and 6 as in sulfones and > sulfoxides. P can commonly have valence of 5, as in phosphoranes. > Your provided SMILES has a valence of at least 3, exceeding the standard > valence of 2. This creates and ambiguity, where the SMILES parser has to > decide whether the S has a valence of 4 or 6. Likewise, with the SMILES > "FP(F)(F)F" a roundtrip through rdkit will convert this into > "F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and > distinguishable from FP(F)F. In general when higher valence states are not > possible rdkit will throw a valence error but there are some more examples > available. For example "CIC" will become C[IH]C. > > best wishes > wim > > > On Sat, Apr 29, 2023 at 12:20 PM Thomas <odioidenti...@gmail.com> wrote: > >> I am not a chemist, so it can be a silly question, but I am interested in >> the logic behind it, also because other libraries (like OpenBabel) behave >> differently. >> >> Why sometimes RDKit writes hydrogens explicitly? >> >> mol = rdkit.MolFromSmiles('CCS=O', sanitize=False) >> rdkit.MolToSmiles(mol) >> 'CC[SH]=O' >> >> The input SMILES is intended as a pattern, not a molecule. I make a mol >> out of it only to get the canonical SMILES, that will be then used as >> SMARTS. >> Logically, I don't understand how the number of H attached to the S can >> be "guessed" by the library, still it cannot be left implicit. >> >> Furthermore, I have seen this behaviour only with S and P. I was >> wondering if it's a confined issue, or it can happen with any element. >> Thank you >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss