[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-17 Thread Vinay Sajip


Change by Vinay Sajip :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.9 -Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-17 Thread Vinay Sajip

Vinay Sajip  added the comment:


New changeset ca7b504a4d4c3a5fde1ee4607b9501c2bab6e743 by Vinay Sajip in branch 
'master':
bpo-37111: Add 'encoding' and 'errors' parameters to logging.basicCon… 
(GH-14008)
https://github.com/python/cpython/commit/ca7b504a4d4c3a5fde1ee4607b9501c2bab6e743


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-13 Thread Steve Dower


Steve Dower  added the comment:

I think Vinay gets to make the call, but I also agree that some sort of error 
handling by default makes sense for logging. Especially now that it's (about to 
be) easier to override the default.

FWIW, I'd leave the default arguments as None and just coerce it as late as 
possible, before TextIOWrapper does it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-12 Thread Inada Naoki

Inada Naoki  added the comment:

> What logging does when UnicodeEncodeError is happened?

Hm, it seems logging does best job, show enough information to stderr.

But sometime, Python is embedded in web servers and there is no stderr.  
pythonw.exe may not have stderr too

I still feel backslashreplace is better default error handler.  But it's not 
strong opinion and it shouldn't block this issue.  Go ahead.


$ cat x.py
import logging

logging.basicConfig(filename="mylog.txt")
msg = "hello, はろー"
logging.error(msg)

$ LC_ALL=en_US.US-ASCII python3 x.py
--- Logging error ---
Traceback (most recent call last):
  File 
"/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/__init__.py",
 line 1037, in emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: 
ordinal not in range(128)
Call stack:
  File "x.py", line 5, in 
logging.error(msg)
Message: 'hello, \u306f\u308d\u30fc'
Arguments: ()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-12 Thread Inada Naoki


Inada Naoki  added the comment:

> I've left it as is using the same rationale as I guess open() has at the 
> moment - "Errors should never pass silently. Unless explicitly silenced".

But open() is general purpose, and logging is for logging.

Note that stderr uses 'backslashreplace', even when 
PYTHONIOENCODING=ascii:strict is used.

And if we don't log something due to encoding error, doesn't it mean "errors 
may ignored completely silently by default"?
What logging does when UnicodeEncodeError is happened?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-12 Thread Vinay Sajip


Vinay Sajip  added the comment:

> On the other hand, couldn't we use different default error handler?
"replace" or "backslashescape" seems better default error handler for the 
logging.

I've left it as is using the same rationale as I guess open() has at the moment 
- "Errors should never pass silently. Unless explicitly silenced".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-12 Thread Inada Naoki


Inada Naoki  added the comment:

Hmm,, about encoding, I agree that default encoding of open() should be used.
If we change it, encoding of log files are changed unexpectedly after upgrading 
Python.

On the other hand, couldn't we use different default error handler?
"replace" or "backslashescape" seems better default error handler for the 
logging.
Is this change affects too many users?

--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-11 Thread Vinay Sajip


Change by Vinay Sajip :


--
pull_requests: +13871
pull_request: https://github.com/python/cpython/pull/14008

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-09 Thread Vinay Sajip


Change by Vinay Sajip :


--
keywords: +patch
pull_requests: +13798
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/13932

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-03 Thread Steve Dower


Steve Dower  added the comment:

I agree with Paul. Consistency with open() is the most important aspect here, 
and when we investigated changing that to UTF-8 it was decided to be too 
disruptive.

That said, I'm not opposed to adding encoding/errors to basicConfig if someone 
wants to do that (and the logging specialists are okay with it).

--
resolution: not a bug -> 
stage:  -> test needed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-01 Thread Paul Moore


Paul Moore  added the comment:

> User would like Python logging of Unicode characters to be consistent

It is consistent. The encoding of

logging.basicConfig(filename='c:\\my_log.log')

is consistent with the encoding of

open('c:\\my_log.log')

Both use the system default encoding, which is not UTF-8. There is some 
discussion going on right now, as to whether it *should* be, but it isn't, and 
I wouldn't consider changing the behaviour of loging *without* changing the 
behaviour of open() to be consistent.

Logging to the console is consistent with the standard IO streams, and it was 
PEP 528 (https://www.python.org/dev/peps/pep-0528/) that made that change - 
before that, the standard IO streams, and logging to the console, used the 
console codepage.

So, while I agree that the behaviour takes a bit of work to understand, it's 
not (IMO) inconsistent, nor is it (IMO) a bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-01 Thread Vinay Sajip


Vinay Sajip  added the comment:

Different people have different ideas of what a default should be. You can 
pretty much guarantee that whatever a default is, someone will think it should 
be something else. The basicConfig functionality has been around in its present 
form since 2003, and people in general haven't had a problem with the current 
defaults. There is no case for changing a default unless there is a consensus 
among lots of people that a default needs changing.

--
stage: resolved -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-01 Thread Jonathan


Jonathan  added the comment:

> Learning is not a waste of time. You're entitled to your opinion, but this is 
> not a bug in logging. We'll have to agree to disagree.

I agree and value learning a great deal. However learning should happen on your 
own time, NOT when a program crashes randomly and tries taking you down the 
rabbit hole. I like learning but not about unrelated things when I'm trying to 
do useful work.

Fine, if you don't consider this a bug, consider it a feature request. "User 
would like Python logging of Unicode characters to be consistent" is not an 
unreasonable request.

--
status: closed -> open
type: behavior -> enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-01 Thread Vinay Sajip


Vinay Sajip  added the comment:

Learning is not a waste of time. You're entitled to your opinion, but this is 
not a bug in logging. We'll have to agree to disagree.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-01 Thread Jonathan


Jonathan  added the comment:

> I have no idea what you mean by this.

I don't see how I can be clearer. What are the reasons for NOT implementing 
logging to file be unicode as a default? Logging to screen is unicode as a 
default. What are the reasons for not wanting consistency?

> A simple Internet search for "basicConfig encoding" yields for me as the 
> second result this Stack Overflow question 

Indeed, and it was from that question I got my solution in fact. The problem 
was the 30-60 minutes I wasted before that trying to figure out why my program 
was crashing and why it was only crashing *sometimes*. I'd written the logging 
part of the program a year ago and not really touched it since, so the logging 
module being a possible culprit was not even in my mind when the program 
crashed.

> As my example illustrated, it's quite easy to log Unicode in log files.

Yes, things are easy when you know it's necessary. It's the process of 
discovery that's an unnecessary waste of people's time. That's why I raised 
this and that's why I would consider this a bug in my own software. It's 
inconsistent, it invites problems, and it wastes peoples time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-06-01 Thread Vinay Sajip


Vinay Sajip  added the comment:

> This seems like an attempt at victim blaming.

Er, no, it was a straight question about whether you'd read the documentation.

> I'm afraid this line of reasoning is suffering from selection bias, cherry 
> picking, confirmation bias, and probably some others too.

Er, no, it was stating that lots of people have successfully used logging on 
Windows with utf-8, as a way of justifying my opinion that there's no 
deficiency here that needs addressing.

> Sure "stream" is on there, but then you need to know about streams and make 
> the association with logging which I apparently don't. You have to remember 
> not everyone has your level of proficiency in the language. In fact, most 
> Python users don't.

Sure, and that's fine. Experienced developers learn stuff every day too. But 
responding to issues takes time (notice how many people will have been notified 
about this issue from the "nosy list" above; each would have to spend at least 
some time looking at it), so raising issues should be done only after 
exhausting other resources to make sure that the problem isn't in one's own 
level of knowledge or understanding, but an actual problem with the software. 
This is why I earlier suggested Stack Overflow and python-list as useful 
resources for improving one's knowledge and understanding.

> Lets put this another way - is there a reason NOT to have Unicode logging as 
> the default?

I have no idea what you mean by this. As my example illustrated, it's quite 
easy to log Unicode in log files.

A simple Internet search for "basicConfig encoding" yields for me as the second 
result this Stack Overflow question

https://stackoverflow.com/questions/21402530

for which the accepted answer is a slight variation on what I suggested earlier.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread Jonathan


Jonathan  added the comment:

> Did you look at the basicConfig documentation before raising this issue?

This seems like an attempt at victim blaming. But yes, I did. In fact, this is 
now the third time I've looked at that page - once before raising this issue, 
once before my previous reply, and now. I note that your example and nothing 
like your example is anywhere on that page. The word "encoding" doesn't appear 
anywhere on the page either. Sure "stream" is on there, but then you need to 
know about streams and make the association with logging which I apparently 
don't. You have to remember not everyone has your level of proficiency in the 
language. In fact, most Python users don't.

Lets put this another way - is there a reason NOT to have Unicode logging as 
the default? Clearly Unicode was important enough for Guido-et-al to decide to 
throw Python 2 under the bus.
I've listed the advantages of changing it, what are the disadvantages?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread Vinay Sajip

Vinay Sajip  added the comment:

> Doing something as basic as logging unicode shouldn't require knowledge of 
> "handlers" - that's failing "simple is better than complex".

I reiterate my statement that it appears that you aren't sufficiently familiar 
with how logging is designed to work. Logging is more complex than print() or 
open() because it has to work flexibly in lots of different scenarios. Handlers 
are part of this, as are Formatters and Filters.

See my 2009 blog post

https://plumberjack.blogspot.com/2009/09/python-logging-101.html

for more information for why logging is designed the way it is.

If you want to configure a stream with utf-8 encoding, without explicitly using 
handlers, this is easily doable, as follows:

C:\Users\Vinay> \python37\python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.basicConfig(stream=open(r'c:\temp\my_log.log', 'w', 
>>> encoding='utf-8'))
>>> logging.error('จุด1')
>>> ^Z

and, lo and behold, the file c:\temp\my_log.log contains

ERROR:root:จุด1

The use of the stream keyword parameter is clearly documented at

https://docs.python.org/3/library/logging.html#logging.basicConfig

Did you look at the basicConfig documentation before raising this issue?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread Jonathan


Jonathan  added the comment:

Thank you for your comments but this wasn't a question and I maintain this is a 
bug, or at least undesirable behaviour. I'd consider it a bug in my own 
software.

Reasoning:
* It's an inconsistent default with the logging to screen. This causes more 
complexity for users when their bug is intermittent.
* Despite your assertion, it's not documented anywhere on the logging docs (I 
did check before creating this bug when trying to figure out what's going on) - 
The word "utf" or "unicode" doesn't appear on the logging page, or any of the 
two tutorials, or the logging.handlers page. There's something in the cookbook 
but that's about BOM's.
* Much of the world's native characters won't log to ASCII

Per this page: https://docs.python.org/3/howto/unicode.html
"UTF-8 is one of the most commonly used encodings, and Python often defaults to 
using it."

> People have been using logging, on Windows, without problems, for years, 
> often using utf-8 to encode their log files.
I'm afraid this line of reasoning is suffering from selection bias, cherry 
picking, confirmation bias, and probably some others too. Clearly people have 
had problems before because it was from one of those folks I took the solution.

Doing something as basic as logging unicode shouldn't require knowledge of 
"handlers" - that's failing "simple is better than complex".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread Vinay Sajip


Vinay Sajip  added the comment:

> Given that the file created by the logger is utf-8, it's unclear why it 
> doesn't work ... I found a workaround by using a Handler

Loggers don't create files - handlers do. The file that you attached seems to 
be just a text file containing ASCII text. Any ASCII is also utf-8, so 
Notepad++'s assertion doesn't mean anything in this situation.

> so that people don't get unpleasant surprises that even more painful to debug 
> when things only break in certain logging modes?

People have been using logging, on Windows, without problems, for years, often 
using utf-8 to encode their log files. Perhaps you need to read the 
documentation and look at examples more carefully - I'm not intending to be 
rude, but say that because using a handler isn't some kind of workaround, and 
you thinking that it was indicates that you're not sufficiently familiar with 
the documentation.

I'd suggest posting questions on Stack Overflow or the python-list mailing 
list, to establish whether there really seems to be a Python bug, before 
actually logging a Python issue. I appreciate that you're trying to improve 
Python.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread SilentGhost


Change by SilentGhost :


--
components: +Windows
nosy: +paul.moore, steve.dower, tim.golden, zach.ware
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread Jonathan


Jonathan  added the comment:

It definitely claims to be "utf-8" in NotePad++. I've attached it if you want 
to double-check.

(Windows 7)

--
Added file: https://bugs.python.org/file48380/my_log.log

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread SilentGhost


Change by SilentGhost :


--
versions: +Python 3.7, Python 3.8 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread SilentGhost


SilentGhost  added the comment:

> Given that the file created by the logger is utf-8

I don't think this is true. Default file handler created using filename 
argument is opened with encoding None (meaning that it relies on system-default 
encoding, which is on Windows is not utf-8). And it also explains why your 
work-around works, provided you've opened file handler with explicit utf-8 
encoding.

--
nosy: +SilentGhost, vinay.sajip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37111] Logging - Inconsistent behaviour when handling unicode

2019-05-31 Thread Jonathan

New submission from Jonathan :

Python is inconsistent in how it handles errors that have some unicode 
characters. It works to screen but fails to log

This works:
```
>>> import logging
>>> logging.error('จุด1')
ERROR:root:จุด1
```

The following breaks:

```
>>> import logging
>>> logging.basicConfig(filename='c:\\my_log.log')
>>> logging.error('จุด1')
```

This raises a unicode error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 11-13: 
character maps to 

Python 3.6.3

Given that the file created by the logger is utf-8, it's unclear why it doesn't 
work.

I found a workaround by using a Handler, but surely the loggers should all work 
the same way so that people don't get unpleasant surprises that even more 
painful to debug when things only break in certain logging modes?

--
messages: 344053
nosy: jonathan-lp
priority: normal
severity: normal
status: open
title: Logging - Inconsistent behaviour when handling unicode
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com