[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-09-05 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
nosy: +brett.cannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-09-05 Thread Paul Ganssle


Paul Ganssle  added the comment:

I've left a mostly finished PR on #8983, but I've decided it's not really worth 
continuing to work on. Anyone can feel free to take it and run with it if they 
want to implement the fix for this.

As Alexey mentions, that PR indeed already fixes this bug, I mainly blocked on 
getting consistent behavior across all platforms. Currently that PR has 
consistent behavior across all platforms if you disable wcsftime *except* 
Ubuntu 14.0, which seems to have some kind of funky bug in wcstombs, though for 
the life of me I don't know how to consistently trigger it. I suspect that 
using strftime instead of wcsftime on all platforms would lead to simpler code 
that works most consistently, but I am guessing wcsftime is faster.

Another option is to use wsftime but fall back to strftime on MacOS in the 
event that the error condition (blank output from non-blank input) is detected.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-30 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

> if we can't make assertions about the behavior of strftime outputs, I think 
> it makes it hard to prevent regressions.

Ironically, some of the changes we may have to make to fix time.strftime() 
inconsistencies may appear like regressions to some users. For example, 
regarding dropping wcsftime() on all platforms, 'man strftime' on macOS 
contains the following warning in BUGS section: "The strftime() function does 
not correctly handle multibyte characters in the format argument.". (I'm not 
sure what "incorrect handling" means here). That's not to discourage you, just 
to point out that we should be extra careful at this point when there might be 
users relying on every variety of the current behavior. And that's also why I 
don't want to mix the fix for C vs. Python issue (low risk) with changes in 
time.strftime() (high risk).

Also, while I certainly agree with "the outsized effort" point, it seems that 
your PR goes far beyond supporting surrogates because falling back to escaping 
allows one to bypass checks in wcsftime()/strftime() and round-trip any code 
point regardless of the current locale.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-30 Thread Paul Ganssle


Paul Ganssle  added the comment:

@izbyshev That's totally fair and I wouldn't want to make it a condition of 
merging the existing fixes - I've already made great progress in fixing the 
time.strftime part as well.

The main reason it relates here is that I generally find the tests to be among 
the hardest part about writing a good PR, and if we can't make assertions about 
the behavior of strftime outputs, I think it makes it hard to prevent 
regressions. I figured if I can solve the problem all the way down the stack in 
one go, I might as well.

That said, Victor makes an *extremely* good point that this is an outsized 
effort for the bug it's fixing. No one really *needs* support for unpaired 
surrogates in their strftime as far as I can tell. The main reason I'm still 
working on it is that I'm curious to see if it's even possible to fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-29 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

@p-ganssle

> I'm finding it very difficult to reconcile these things.

Paul, this issue was only about reconciling C and Python implementations of 
datetime module -- not fixing time.strftime(), which both of them delegate to. 
While the latter is certainly more important, it's also much more difficult, 
and IMO needs another issue report (maybe merged with #34512). As for the 
former, it seems that you've already fixed it in your PR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-29 Thread STINNER Victor


STINNER Victor  added the comment:

> Because caring of surrogates,

Before*

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-29 Thread STINNER Victor


STINNER Victor  added the comment:

Because caring of surrogates, I would prefer to first fix time.strftime() and 
time.strptime() for legit time zone names. Currently, Python uses the C 
function strfime() because the string produced by wcsftime() cannot be parsed 
later. We should use the native Windows API to get the time zone name properly 
encoded. I don't recall the details, sorry. There is maybe an open issue for 
that one.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-29 Thread Paul Ganssle


Paul Ganssle  added the comment:

I'm finding it very difficult to reconcile these things. I'm not entirely sure, 
but we may need to take a performance hit in normal strftime if we want to make 
this work with surrogate characters, which really does not appeal to me (though 
we can certainly improve to some degree).

One major question here: Is anyone (@vstinner, @belopolsky?) know why time's 
strftime opportunistically uses wcsftime instead of strftime? It makes the code 
*way* more complicated and difficult to read / maintain - are there platforms 
that provide wcstrftime and not strftime?

Also, related, it seems according to 
https://bugs.python.org/issue10653#msg243660 that there may have been a 
regression in issue #10653, which may be related here.

Either way, some note should probably be made in the code to clarify exactly 
*why* these choices were made in the code, in case the situation has changed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-28 Thread STINNER Victor


Change by STINNER Victor :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-28 Thread Paul Ganssle


Change by Paul Ganssle :


--
keywords: +patch
pull_requests: +8457
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-23 Thread Paul Ganssle


Change by Paul Ganssle :


--
nosy: +p-ganssle

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34481] Different behavior of C and Python impls of datetime.strftime with non-UTF-8-encodable strings

2018-08-23 Thread Alexey Izbyshev


New submission from Alexey Izbyshev :

The C datetime implementation uses PyUnicode_AsUTF8AndSize() in wrap_strftime() 
and rejects strings containing surrogate code points (0xD800 - 0xDFFF) since 
they can't be encoded in UTF-8. On the other hand, the pure-Python datetime 
implementation doesn't have this restriction:

>>> import sys
>>> sys.modules['_datetime'] = None # block C implementation
>>> from datetime import time
>>> time().strftime('\ud800')
'\ud800'
>>> del sys.modules['datetime']
>>> del sys.modules['_datetime']
>>> from datetime import time
>>> time().strftime('\ud800')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 
0: surrogates not allowed

--
components: Extension Modules
messages: 323963
nosy: belopolsky, izbyshev, pitrou, serhiy.storchaka, taleinat
priority: normal
severity: normal
status: open
title: Different behavior of C and Python impls of datetime.strftime with 
non-UTF-8-encodable strings
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com