Re: [Rdkit-discuss] Working with SDF from varying locales?
On Fri, Sep 30, 2022 at 4:35 PM Rocco Moretti wrote: > Hi Greg, > > > The RDKit doesn't normally convert data field values into floats unless > you explicitly ask it to > > I did notice that mol.GetProp() will always return things by string, and > you would need to use mol.GetDoubleProp() if you explicitly wanted a > numeric value, but it looks like mol.GetPropsAsDict() will automatically > convert to integers/floating point as appropriate. I guess I was wondering > if there was a way to get GetPropsAsDict() to be more gregarious with the > locale (and/or make GetDoubleProp() more robust to not raising an > exception). > I don't believe that there is. But if I need to handle the locale re-parsing on my own, I can probably > knock something together to do that. > I think this will be necessary, particularly since it sounds like you need to try multiple locales anyway. > Luckily the CTAB section in my files are all the same C locale, so I don't > have to worry about that headache. > That's at least something to be grateful for! :-) -greg ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Working with SDF from varying locales?
Hi Greg, > The RDKit doesn't normally convert data field values into floats unless you explicitly ask it to I did notice that mol.GetProp() will always return things by string, and you would need to use mol.GetDoubleProp() if you explicitly wanted a numeric value, but it looks like mol.GetPropsAsDict() will automatically convert to integers/floating point as appropriate. I guess I was wondering if there was a way to get GetPropsAsDict() to be more gregarious with the locale (and/or make GetDoubleProp() more robust to not raising an exception). But if I need to handle the locale re-parsing on my own, I can probably knock something together to do that. Luckily the CTAB section in my files are all the same C locale, so I don't have to worry about that headache. Thanks, Rocco On Fri, Sep 30, 2022 at 9:21 AM Greg Landrum wrote: > Hi Rocco, > > Paolo already replied about the options available for python when > interpreting the data fields from an SDF. The RDKit doesn't normally > convert data field values into floats unless you explicitly ask it to, so > this would be fine to do from Python > > The CTAB part of the SDF, which includes the coordinates, always parses > the coordinates using the C locale (regardless of what the current locale > on the machine is)... this is more or less part of the CTAB spec from MDL. > > -greg > > > On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti > wrote: > >> Hello, >> >> I have a number of SDFs of molecules with associated data blocks. (That >> is, the `>` section that comes after `M END` and before ``.) >> >> The problem I have is that these SDFs were generated in different >> countries, and have different locales -- most notably, some of them use "." >> as the decimal separator for real-valued properties and some use ",". To >> make things even more fun, some use a mix of both, depending on who >> calculated which properties where. >> >> Is there any facility in RDKit for reading in such locale-varying SDF >> files and normalizing them? >> >> Thanks, >> Rocco >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Working with SDF from varying locales?
Hi Rocco, Paolo already replied about the options available for python when interpreting the data fields from an SDF. The RDKit doesn't normally convert data field values into floats unless you explicitly ask it to, so this would be fine to do from Python The CTAB part of the SDF, which includes the coordinates, always parses the coordinates using the C locale (regardless of what the current locale on the machine is)... this is more or less part of the CTAB spec from MDL. -greg On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti wrote: > Hello, > > I have a number of SDFs of molecules with associated data blocks. (That > is, the `>` section that comes after `M END` and before ``.) > > The problem I have is that these SDFs were generated in different > countries, and have different locales -- most notably, some of them use "." > as the decimal separator for real-valued properties and some use ",". To > make things even more fun, some use a mix of both, depending on who > calculated which properties where. > > Is there any facility in RDKit for reading in such locale-varying SDF > files and normalizing them? > > Thanks, > Rocco > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Working with SDF from varying locales?
Hi Rocco, the locale Python module will allow you to do this sort of normalizations on strings, e.g.: import locale locale.getlocale() ('en_US', 'UTF-8') locale.setlocale(locale.LC_ALL, "it_IT") 'it_IT' locale.delocalize("1,222") '1.222' But this requires you to know the locale the values where originally encoded in. HTH, cheers p. On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti wrote: > Hello, > > I have a number of SDFs of molecules with associated data blocks. (That > is, the `>` section that comes after `M END` and before ``.) > > The problem I have is that these SDFs were generated in different > countries, and have different locales -- most notably, some of them use "." > as the decimal separator for real-valued properties and some use ",". To > make things even more fun, some use a mix of both, depending on who > calculated which properties where. > > Is there any facility in RDKit for reading in such locale-varying SDF > files and normalizing them? > > Thanks, > Rocco > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss