On Jun 9, 2014, at 9:40 PM, Christian K. <[email protected]> wrote:
> Am 09.06.14 16:00, schrieb [email protected]:
>>
>> On Jun 9, 2014, at 2:53 PM, Christian K. <[email protected]> wrote:
>>
>>> <Paul_Koning <at> Dell.com> writes:
>>>
>>>>
>>>>
>>>> On Jun 9, 2014, at 9:07 AM, Christian K. <ckkart <at> hoc.net> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was very pleased to see that retrieving properties of a MAPI object
>>>>> yields
>>>>> either a <str> or <bytes> type depending on whether the _A or _W property
>>>>> was queried …
>>>>
>>>> Really? That seems strange. As I recall, the *_W APIs are “wide
>>> character” ones. So in Python 3, they
>>>> should both map to <str> type. <bytes> applies only to non-text data.
>>>
>>> At least for text properties like e.g. PR_SUBJECT_A / _W the former returns
>>> a mbcs encoded "string", i.e. of bytes type and the latter a 2-byte unicode
>>> string. Binary properties are always returned as bytes in contrast to
>>> earlier when using pyrhon2.
>>
>> Yes, “bytes” for binary values is clearly correct. But MBCS and “2 byte
>> Unicode” (more accurately called either UCS-2 or UCS-2 BMP subset, not sure
>> which) are both text strings. The different encoding in the API doesn’t
>> mean they should be different datatypes in Python 3; both cases are properly
>> mapped to “str”.
>
> No, this is not what I am seeing. MBCS encoded properties, i.e. those
> terminating with _A are mapped to 'bytes' and the _W ones to 'str' which is
> consistent with the handling of unicode and encoded information in python3.
> And this is great indeed because having to distinguish between strings which
> can be encoded or not while having the same type is really painful.
Perhaps I’m missing something.
I’m used to Windows API calls that come in a foo_A and foo_W flavor, the only
difference being that the _A flavor has ASCII arguments and the _W flavor has
Unicode arguments (for those arguments that are, abstractly, strings).
In Python 3, the “str” type is an abstract string; its character repertoire is
Unicode but it doesn’t have an encoding. Instead, encoding and decoding is
done when it is converted to/from external interfaces — files, external API
calls, etc.
So... I would expect foo_A and foo_W to have “str” arguments, and the interface
machinery between Python3 and those functions would run the appropriate
encoding to generate the string representation expected.
For example, if a given API wants strings in ASCII form, it would be str.encode
(“ascii”) or perhaps str.encode (“latin1”). If it wants MBCS data, it would be
encode to that encoding. If 2-byte Unicode, it would be encode to ucs-2. And
so on. Ditto in the reverse direction, when strings are delivered by an
external function.
I would only want/expect to see “bytes” types when the values in question are
binary data streams, or unknown format. But anytime we’re dealing with text
strings, the Python 3 approach is that the Python code sees “str” type, and
questions of encoding have been handled at the edge. This is where Python 3
gets it right and Python 2 was a big muddle.
Mark, could you clarify how you would expect this to work?
paul
_______________________________________________
python-win32 mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-win32