Hi Greg,

> The RDKit doesn't normally convert data field values into floats unless
you explicitly ask it to

I did notice that mol.GetProp() will always return things by string, and
you would need to use mol.GetDoubleProp() if you explicitly wanted a
numeric value, but it looks like mol.GetPropsAsDict() will automatically
convert to integers/floating point as appropriate. I guess I was wondering
if there was a way to get GetPropsAsDict() to be more gregarious with the
locale (and/or make GetDoubleProp() more robust to not raising an
exception).

But if I need to handle the locale re-parsing on my own, I can probably
knock something together to do that.

Luckily the CTAB section in my files are all the same C locale, so I don't
have to worry about that headache.

Thanks,
Rocco

On Fri, Sep 30, 2022 at 9:21 AM Greg Landrum <greg.land...@gmail.com> wrote:

> Hi Rocco,
>
> Paolo already replied about the options available for python when
> interpreting the data fields from an SDF. The RDKit doesn't normally
> convert data field values into floats unless you explicitly ask it to, so
> this would be fine to do from Python
>
> The CTAB part of the SDF, which includes the coordinates, always parses
> the coordinates using the C locale (regardless of what the current locale
> on the machine is)... this is more or less part of the CTAB spec from MDL.
>
> -greg
>
>
> On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti <rmoretti...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have a number of SDFs of molecules with associated data blocks. (That
>> is, the `>` section that comes after `M END` and before `$$$$`.)
>>
>> The problem I have is that these SDFs were generated in different
>> countries, and have different locales -- most notably, some of them use "."
>> as the decimal separator for real-valued properties and some use ",".  To
>> make things even more fun, some use a mix of both, depending on who
>> calculated which properties where.
>>
>> Is there any facility in RDKit for reading in such locale-varying SDF
>> files and normalizing them?
>>
>> Thanks,
>> Rocco
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to