[issue14452] SysLogHandler sends invalid messages when using unicode

2014-07-29 Thread Daniel Pocock

Daniel Pocock added the comment:

As a workaround, Python 3.2 users can clobber the global variable codecs like 
this from their own init code:

logging.handlers.codec = None

There is a more complete example here:

https://github.com/dpocock/python-rfc5424-logging-formatter

--
nosy: +pocock

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2013-01-09 Thread Roy Smith

Changes by Roy Smith :


--
nosy: +roysmith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-08-21 Thread Jody McIntyre

Changes by Jody McIntyre :


--
nosy: +scjody

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-16 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset 603301cfb194 by Vinay Sajip in branch 'default':
Closes #14452: brought tests in line with removal of BOM insertion code.
http://hg.python.org/cpython/rev/603301cfb194

--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-16 Thread R. David Murray

R. David Murray  added the comment:

This appears to be failing on the buildbots:

http://www.python.org/dev/buildbot/all/builders/x86%20OpenIndiana%203.x/builds/3358/steps/test/logs/stdio
http://www.python.org/dev/buildbot/all/builders/x86%20Gentoo%20Non-Debug%203.x/builds/2037/steps/test/logs/stdio

--
nosy: +r.david.murray
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-16 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset af46a001d5ec by Vinay Sajip in branch '2.7':
Issue #14452: remove BOM insertion code.
http://hg.python.org/cpython/rev/af46a001d5ec

New changeset 89ab589f6fa7 by Vinay Sajip in branch '3.2':
Closes #14452: remove BOM insertion code.
http://hg.python.org/cpython/rev/89ab589f6fa7

New changeset 372aa4267a43 by Vinay Sajip in branch 'default':
Closes #14452: remove BOM insertion code.
http://hg.python.org/cpython/rev/372aa4267a43

--
nosy: +python-dev
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-12 Thread Vinay Sajip

Vinay Sajip  added the comment:

> What would the behavior be when run on an older version without support for 
> that placeholder be?

Then it would fail when the format string contained e.g. %(UTF8BOM)s and there 
was no corresponding attribute in the LogRecord - but that's true of any 
feature which is introduced in newer Pythons. I get that it would be unexpected 
in a point release, and that's why I've posted about this on c.l.py (no 
feedback from there so far).

> I'm okay with adding this but wouldn't be surprised if release managers are 
> not.

Hence the post to python-dev, but no release manager has expressed an opinion 
yet.

> Perhaps just fixing it nicely in 3.3 is sufficient while documenting the 
> misbehavior as a known issue for 2.7 and 3.2.

This is doable at a user level, except for the fact that the BOM insertion is 
currently unconditional. If I just remove the BOM insertion in 2.7 and 3.2, 
then I don't need to do the UTF8BOM placeholder thing in the stdlib; I could 
just add a cookbook recipe telling users how to do it. I am thinking about a 
different solution for 3.3 anyway, i.e. adding one or more overridable methods 
to SysLogHandler.

Since no one has objected on c.l.py about the proposed change (which implied 
that if there were no objections, the change would happen) Marko may be right 
that not many people are affected, or care. I'll wait a little while longer, 
and if no objections are forthcoming I'll remove the BOM insertion in 2.7 and 
3.2, add a cookbook recipe for those who need a BOM and leave it at that for 
those versions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-12 Thread Gregory P. Smith

Gregory P. Smith  added the comment:

Your three step approach makes sense... But it _is_ still technically a new API 
though in that the UTF8BOM placeholder for LogRecord's is being introduced.  
What would the behavior be when run on an older version without support for 
that placeholder be?

I'm okay with adding this but wouldn't be surprised if release managers are 
not.  But I personally have no need for syslog logging from Python so it'd be 
better to have someone who needs this to work properly chime in.

Perhaps just fixing it nicely in 3.3 is sufficient while documenting the 
misbehavior as a known issue for 2.7 and 3.2.

--
nosy: +gregory.p.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-11 Thread Vinay Sajip

Vinay Sajip  added the comment:

> But why on earth would one want a BOM in UTF-8-encoded data?  It is 
> byte-order independent!

Lord only knows, but the RFC does call for it - msg157572 has an actual excerpt 
from RFC 5424.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-11 Thread Tim Golden

Tim Golden  added the comment:

It's used by some systems (Windows Notepad does this
if you save as UTF8) to indicate that the byte stream
*is* UTF8-encoded. It's not so much a BOM as a magic cookie.

I can't speak for syslog, I'm afraid

TJG

--
nosy: +tim.golden

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-11 Thread Guido van Rossum

Guido van Rossum  added the comment:

But why on earth would one want a BOM in UTF-8-encoded data?  It is byte-order 
independent!

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-11 Thread Vinay Sajip

Vinay Sajip  added the comment:

I have a possible suggestion about how to resolve this issue:

The SysLogHandler will not do BOM insertion unless the message is Unicode. If 
it is Unicode, it will add the attribute 'UTF8BOM' to the LogRecord, with the 
value u'\ufeff'. The record will then be formatted; if the format string 
contains the UTF8BOM placeholder, it will be replaced with the value u'\ufeff', 
which, when encoded, results in the UTF-8 BOM value '\xef\xbb\xbf'. The user of 
the format string is responsible for ensuring that:

1. If there's no UTF8BOM placeholder in the format string, everything in the 
formatted result must always encode to plain ASCII when encoded using UTF-8.

2. If there is a UTF8BOM placeholder in the format string, everything in the 
formatted result prior to the placeholder must always encode to plain ASCII 
when encoded using UTF-8. The stuff following can, of course, be free of that 
restriction.

3. The end result of encoding should be a prefix which is bytes of pure ASCII, 
then the BOM (if the placeholder is present in the format string), then bytes 
of UTF-8 encoded Unicode.

In any case, a Unicode string will be encoded using UTF-8. If no UTF8BOM 
placeholder was present, no BOM will be; the message can be considered to just 
be a set of octets, which just happens to be UTF-8 encoded Unicode. If the 
placeholder was present, the BOM should appear at the appropriate place to 
comply with RFC 5424.

On 3.2, the message will always be Unicode, and the above processing will take 
place (whereas on 2.x it will be conditional on the type of the formatted 
message string being Unicode).

This seems to provide a resolution to the issue which can be solved without API 
changes, and with changes to the format string if BOM insertion is needed. With 
no UTF8BOM placeholder, the BOM will simply not be inserted. Can you comment on 
this suggestion?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-06 Thread Vinay Sajip

Changes by Vinay Sajip :


--
assignee:  -> vinay.sajip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-05 Thread marko kreen

marko kreen  added the comment:

Note additional brokenness in BOM addition:
* It does add BOM even when msg is ascii, just because it was in unicode() 
string.
* It does not add BOM to UTF8 string if it is already encoded to str().

So highly doubt anybody actually relies on it.  And keeping such addition just 
increases chance that anybody starts to rely on it, so it would be good to just 
drop it.

For 3.3, I think it would be best to move syslog packet formatting out from 
emit(), because emit() already contains noticeable amount of unrelated code, so 
full override it is annoying.

I suggested changing BOM addition so that it only adds it to %(message)s, but 
such change could cause backwards incompatibility. So it does not look like 
good idea.

But dropping BOM even in old Python *does* look like good idea, as anybody 
using SysLoghandler needs to deal with BOM-less UTF8 anyway,
so dropping it will not create backwards incompatibility.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-05 Thread Vinay Sajip

Vinay Sajip  added the comment:

Ok, I see what the problem is. I could go for option 1 - leave the BOM out, 
encode the string as UTF-8 but send it as just a bunch of bytes, i.e. the 
MSG-ANY variant of the spec. However, this could break any existing code that 
doesn't use structured data before the message, as you are doing, and relies on 
the MSG-UTF8 variant. So while agreeing with you that the situation isn't 
ideal, I don't see how I can change things while preserving backward 
compatibility, other than:

Introduce a class-level _insert_BOM attribute, defaulting to True but which can 
be set to False on a per-instance level. The BOM would only be inserted if this 
were True, so the current behaviour is preserved, but you could set the 
attribute to False and leave the BOM out. However, isn't ideal either, as it 
requires you to be sensitive to the exact version of Python in use - not easy 
if you're developing a library rather than an application.

I will consider a different approach in 3.3, e.g. provide one or more 
overridable methods to construct the message sent across the socket.

Thoughts? If these methods won't suffice, you can always resort to subclassing 
the handler and overriding the emit method (not that that's ideal, either).

--
resolution: invalid -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-04-05 Thread marko kreen

marko kreen  added the comment:

The 'msg' in SysLogHandler does not correspond to MSG in RFC.
Nor to %(message)s in log record.

RFC:

SYSLOG-MSG  = HEADER SP STRUCTURED-DATA [SP MSG]
HEADER  = PRI VERSION SP TIMESTAMP SP HOSTNAME
  SP APP-NAME SP PROCID SP MSGID
PRI = "<" PRIVAL ">"
[...]
MSG = MSG-ANY / MSG-UTF8
MSG-ANY = *OCTET ; not starting with BOM
MSG-UTF8= BOM UTF-8-STRING
BOM = %xEF.BB.BF


logging:

The SysLogHandler does not provide any fields after PRI.
Which is OK, those can be added via format string.

We are using following formatter to get RFC-like structure
to message:

  [formatter_syslog]
  format=%(hostname)s %(service_name)s %(job_name)s %(message)s

That's not fully compliant, but that's beside the point.
The problem is that SysLogHandler puts BOM before hostname
not before %(message)s.  In Python 2.6 it put it even before PRI,
which was even more annoying.

I see 2 solutions to this:

1) Stop putting BOM at all, its not necessary.
2) Put the BOM into %(message)s somehow, thus working with whatever
format is given.

--
status: pending -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-03-30 Thread Vinay Sajip

Vinay Sajip  added the comment:

> the RFC specifies that the BOM should prefix only unstructured
> message part, but current Python implementation puts it in the
> middle of structured part, thus confusing RFC-compliant receivers.

How do you work that out? The implementation puts the BOM before the message 
part:

prio = '<%d>' % self.encodePriority(...)
prio = prio.encode('utf-8')
msg = msg.encode('utf-8')
if codecs:
msg = codecs.BOM_UTF8 + msg
msg = prio + msg

Perhaps I've misunderstood you or the RFC5424, but I don't quite see how. I'll 
mark this as "pending" and "invalid" awaiting a clarification from you (ideally 
with a specific example).

--
resolution:  -> invalid
status: open -> pending

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-03-30 Thread R. David Murray

Changes by R. David Murray :


--
versions:  -Python 2.6, Python 3.1, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-03-30 Thread R. David Murray

Changes by R. David Murray :


--
nosy: +vinay.sajip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-03-30 Thread marko kreen

Changes by marko kreen :


--
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14452] SysLogHandler sends invalid messages when using unicode

2012-03-30 Thread marko kreen

New submission from marko kreen :

SysLogHandler converts message to utf8 and adds BOM, supposedly
to conform with RFC5424, but the implementation is broken:
the RFC specifies that the BOM should prefix only unstructured
message part, but current Python implementation puts it in the
middle of structured part, thus confusing RFC-compliant receivers.

Simplest fix would be to just remove the BOM adding.

--
components: Library (Lib)
messages: 157135
nosy: zmk
priority: normal
severity: normal
status: open
title: SysLogHandler sends invalid messages when using unicode
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com