Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Greg Landrum
On Fri, Sep 30, 2022 at 4:35 PM Rocco Moretti  wrote:

> Hi Greg,
>
> > The RDKit doesn't normally convert data field values into floats unless
> you explicitly ask it to
>
> I did notice that mol.GetProp() will always return things by string, and
> you would need to use mol.GetDoubleProp() if you explicitly wanted a
> numeric value, but it looks like mol.GetPropsAsDict() will automatically
> convert to integers/floating point as appropriate. I guess I was wondering
> if there was a way to get GetPropsAsDict() to be more gregarious with the
> locale (and/or make GetDoubleProp() more robust to not raising an
> exception).
>

I don't believe that there is.

But if I need to handle the locale re-parsing on my own, I can probably
> knock something together to do that.
>

I think this will be necessary, particularly since it sounds like you need
to try multiple locales anyway.



> Luckily the CTAB section in my files are all the same C locale, so I don't
> have to worry about that headache.
>

That's at least something to be grateful for! :-)

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Rocco Moretti
Hi Greg,

> The RDKit doesn't normally convert data field values into floats unless
you explicitly ask it to

I did notice that mol.GetProp() will always return things by string, and
you would need to use mol.GetDoubleProp() if you explicitly wanted a
numeric value, but it looks like mol.GetPropsAsDict() will automatically
convert to integers/floating point as appropriate. I guess I was wondering
if there was a way to get GetPropsAsDict() to be more gregarious with the
locale (and/or make GetDoubleProp() more robust to not raising an
exception).

But if I need to handle the locale re-parsing on my own, I can probably
knock something together to do that.

Luckily the CTAB section in my files are all the same C locale, so I don't
have to worry about that headache.

Thanks,
Rocco

On Fri, Sep 30, 2022 at 9:21 AM Greg Landrum  wrote:

> Hi Rocco,
>
> Paolo already replied about the options available for python when
> interpreting the data fields from an SDF. The RDKit doesn't normally
> convert data field values into floats unless you explicitly ask it to, so
> this would be fine to do from Python
>
> The CTAB part of the SDF, which includes the coordinates, always parses
> the coordinates using the C locale (regardless of what the current locale
> on the machine is)... this is more or less part of the CTAB spec from MDL.
>
> -greg
>
>
> On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti 
> wrote:
>
>> Hello,
>>
>> I have a number of SDFs of molecules with associated data blocks. (That
>> is, the `>` section that comes after `M END` and before ``.)
>>
>> The problem I have is that these SDFs were generated in different
>> countries, and have different locales -- most notably, some of them use "."
>> as the decimal separator for real-valued properties and some use ",".  To
>> make things even more fun, some use a mix of both, depending on who
>> calculated which properties where.
>>
>> Is there any facility in RDKit for reading in such locale-varying SDF
>> files and normalizing them?
>>
>> Thanks,
>> Rocco
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Greg Landrum
Hi Rocco,

Paolo already replied about the options available for python when
interpreting the data fields from an SDF. The RDKit doesn't normally
convert data field values into floats unless you explicitly ask it to, so
this would be fine to do from Python

The CTAB part of the SDF, which includes the coordinates, always parses the
coordinates using the C locale (regardless of what the current locale on
the machine is)... this is more or less part of the CTAB spec from MDL.

-greg


On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti  wrote:

> Hello,
>
> I have a number of SDFs of molecules with associated data blocks. (That
> is, the `>` section that comes after `M END` and before ``.)
>
> The problem I have is that these SDFs were generated in different
> countries, and have different locales -- most notably, some of them use "."
> as the decimal separator for real-valued properties and some use ",".  To
> make things even more fun, some use a mix of both, depending on who
> calculated which properties where.
>
> Is there any facility in RDKit for reading in such locale-varying SDF
> files and normalizing them?
>
> Thanks,
> Rocco
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Paolo Tosco
Hi Rocco,

the locale Python module will allow you to do this sort of normalizations
on strings, e.g.:

import locale

locale.getlocale()

('en_US', 'UTF-8')


locale.setlocale(locale.LC_ALL, "it_IT")

'it_IT'


locale.delocalize("1,222")

'1.222'


But this requires you to know the locale the values where originally encoded in.


HTH, cheers

p.


On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti  wrote:

> Hello,
>
> I have a number of SDFs of molecules with associated data blocks. (That
> is, the `>` section that comes after `M END` and before ``.)
>
> The problem I have is that these SDFs were generated in different
> countries, and have different locales -- most notably, some of them use "."
> as the decimal separator for real-valued properties and some use ",".  To
> make things even more fun, some use a mix of both, depending on who
> calculated which properties where.
>
> Is there any facility in RDKit for reading in such locale-varying SDF
> files and normalizing them?
>
> Thanks,
> Rocco
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss