On 26 Oct, 09:49 pm, _...@lvh.io wrote:
Hi everyone,
I'm working on #6749 for porting t.p.logfile to python3. I'm dealing
with
some test failures, which you can see from the buildbot here:
http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/1602/steps/shell/logs/stdio
I have pasted the relevant bit into a gist here:
https://gist.github.com/lvh/7174766
I think what's happening is that LogFile.write should take native
strings
(since that's what log.msg takes). However, I'm opening all files in
binary
mode, since that's on the reviewer checklist (point 8) for the Python 3
porting plan.
Argh. I was all set up to object to the "...should take native strings"
bit but then I reviewed some history.
If you haven't already, read
<https://twistedmatrix.com/trac/ticket/5932>. In hindsight, it would
have been nice if this branch had come with documentation of some kind.
From what I can tell, this means:
1. yep, `log.msg` takes native strings
2. LogPublisher doesn't do anything with encoding or decoding
3. a log observer can choose to do whatever it wants with what it gets
(but it should be prepared to handle bytes on python 2 and unicode
on python 3)
So... invent some policy. What encoding does Twisted's built-in file
log observer use for the files it writes? Up until now the answer has
been ASCII because it doesn't try to handle unicode so anything non-
ASCII results in exceptions (unless you're using PyGTK... let's not go
there).
UTF-8 is obviously the only possible correct answer, I guess. This
needs to be documented, of course.
And I predict that next someone will come along with a feature request
for a command-line option to twistd to make it write logs with a
different encoding. :(
Whether you open the file in binary mode or not is up to you in this
case, I think. You could do that and handle the encoding yourself or
you could open it in text mode with the right encoding and let it handle
the encoding.
The blanket statement on
https://twistedmatrix.com/trac/wiki/Plan/Python3 about open calls is
perhaps slightly too general. It seems like that should apply to cases
where the behavior is not supposed to be changing - which should be the
case for the majority of porting work. However, `log.msg` has already
changed behavior as part of the port.
*Or*, it now occurs to me, just stick with the ASCII-only policy that's
already in place. I'd even say this is more correct since porting isn't
supposed to change behavior. Leave support for some other codec for
another ticket (perhaps #989). Apart from being simpler (I hope) and
avoiding breaching the documented porting guidelines, this also means
someone will actually have to think about unicode support on Python 2 as
well. Saying we support unicode in the logging system is a lot better
than saying we support unicode in the logging system on Python 3 only.
Jean-Paul
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python