[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-14 Thread STINNER Victor


STINNER Victor  added the comment:

A solution to make time.strftime() more portable would be to split the format 
string, format each "%xxx" substring separately but don't pass substrings 
between "%xxx" to strftime(). There is a similar discussion about trailing "%": 
bpo-35066.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread Tal Einat


Change by Tal Einat :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions:  -Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread Tal Einat


Change by Tal Einat :


--
pull_requests:  -11135, 11137

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread Tal Einat


Change by Tal Einat :


--
pull_requests:  -11135, 11137, 11139

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread Tal Einat


Change by Tal Einat :


--
pull_requests:  -11135

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


miss-islington  added the comment:


New changeset 77b80c956f39df34722bd8646cf5b83d149832c4 by Miss Islington (bot) 
in branch '2.7':
bpo-34512: Document platform-specific strftime() behavior for non-ASCII format 
strings (GH-8948)
https://github.com/python/cpython/commit/77b80c956f39df34722bd8646cf5b83d149832c4


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


miss-islington  added the comment:


New changeset 678c5c07521caca809b1356d954975e6234c49ae by Miss Islington (bot) 
in branch '3.7':
bpo-34512: Document platform-specific strftime() behavior for non-ASCII format 
strings (GH-8948)
https://github.com/python/cpython/commit/678c5c07521caca809b1356d954975e6234c49ae


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11135, 11136, 11137, 11138, 11139

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11135, 11136, 11137, 11139

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11134, 11135, 11136, 11137

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11134, 11135

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11134, 11135, 11136

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11134

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2019-01-12 Thread Tal Einat


Tal Einat  added the comment:


New changeset 1cffd0eed313011c0c2bb071c8affeb4a7ed05c7 by Tal Einat (Alexey 
Izbyshev) in branch 'master':
bpo-34512: Document platform-specific strftime() behavior for non-ASCII format 
strings (GH-8948)
https://github.com/python/cpython/commit/1cffd0eed313011c0c2bb071c8affeb4a7ed05c7


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2018-09-27 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2018-09-27 Thread Tal Einat


Change by Tal Einat :


--
versions: +Python 2.7, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2018-08-26 Thread Alexey Izbyshev


Change by Alexey Izbyshev :


--
keywords: +patch
pull_requests: +8424
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings

2018-08-26 Thread Alexey Izbyshev

New submission from Alexey Izbyshev :

If a format string contains code points outside of ASCII range, time.strftime() 
can behave in four different ways depending on the platform, the current locale 
and the code points:

* raise a UnicodeEncodeError
* return an empty string
* for surrogates in \uDC80-\uDCFF range, replace them with different code 
points in the output (potentially mangling nearby parts of the output as well)
* round-trip them correctly

Some examples:

* Linux (glibc 2.27):
Python 3.6.4 (default, Jan 03 2018, 13:52:55) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import time, locale
>>> locale.getlocale()
('en_US', 'UTF-8')
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
'\ud800'
>>> time.strftime('\udcff')
'\udcff'
>>> locale.setlocale(locale.LC_CTYPE, 'C')
'C'
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
'\ud800'
>>> time.strftime('\udcff')
'\udcff'

* macOS 10.13.6 and FreeBSD 11.1:
Python 3.7.0 (default, Jul 23 2018, 20:22:55)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time, locale
>>> locale.getlocale()
('en_US', 'UTF-8')
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
''
>>> time.strftime('\udcff')
''
>>> locale.setlocale(locale.LC_CTYPE, 'C')
'C'
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
''
>>> time.strftime('\ud800')
''
>>> time.strftime('\udcff')
''

* Windows 8.1:
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit 
(AMD64)] on win32
>>> import time, locale
>>> locale.getlocale()
(None, None)
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'locale' codec can't encode character '\u044f' in position 
0: encoding error
>>> time.strftime('\ud800')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'locale' codec can't encode character '\ud800' in position 
0: encoding error
>>> time.strftime('\udcff')
'y' # '\xff'
>>> locale.setlocale(locale.LC_CTYPE, '')
'Russian_Russia.1251'
>>> time.strftime('\x80')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'locale' codec can't encode character '\x80' in position 0: 
encoding error
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'locale' codec can't encode character '\ud800' in position 
0: encoding error
>>> time.strftime('\udcff')
'я' # '\u044f'

The reasons of such differences are the following:
* Reliance on either wcsftime() or strftime() from the C library depending on 
the platform.
* For strftime(), the input is encoded into the charset of the current locale 
with 'surrogateescape' error handler, and the output is decoded back in the 
same way.
* Different handling of code points which are unrepresentable in the charset of 
the current locale by glibc and macOS/FreeBSD.

I suggest to at least document that the format string, despite being an 'str', 
requires special care if it contains non-ASCII code points.

The 'datetime' module docs warn about the locale-dependent output, but only 
with regard to particular format specifiers [1].

I'll submit a draft PR. Suggestions are welcome.

[1] 
https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior

--
assignee: docs@python
components: Documentation
messages: 324136
nosy: belopolsky, docs@python, izbyshev, p-ganssle, taleinat
priority: normal
severity: normal
status: open
title: Document platform-specific strftime() behavior for non-ASCII format 
strings
type: enhancement
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com