New submission from Aimon Bustardo <abusta...@morphlabs.com>:

Ubuntu 12.0.4 LTS 64bit
python2.7-minimal 2.7.3-0ubuntu3
rsyslog 5.8.6-1ubuntu8

Python converts all syslog messages to UTF8 before sending to syslog. It also 
prepends the Byte Order Mark (BOM) of the Unicode Standard. This prepended BOM 
causes bad characters when using rsyslog (have not verified with std syslog or 
syslog-ng).

Example log line:

Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi 
[req-48a555a5-6d2a-4a38-8384-3b4684357e72 19f932a5b0b34655989f4cb761522bb3 
2617e657fdf84569a6be7977318e46c8] 
http://MASKED:8774/v1.1/2617e657fdf84569a6be7977318e46c8/os-hosts/MASKED.json?ignore_awful_caching1343248563
 returned with HTTP 200

Note the ' ' before the date field.

Interesting find on issues from another site:

"Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard. Specifically 
it is the hex bytes EF BB BF, which form the UTF-8 representation of the BOM, 
misinterpreted as ISO 8859/1 text instead of UTF-8."

If I patch the code in /usr/lib/python2.7/logging/handlers.py:
------------------------------------------
@@ -797,9 +797,10 @@
                                             self.mapPriority(record.levelname))
         # Message is a string. Convert to bytes as required by RFC 5424
         if type(msg) is unicode:
            msg = msg.encode('utf-8')
- if codecs:
- msg = codecs.BOM_UTF8 + msg
+ #if codecs:
+ # msg = codecs.BOM_UTF8 + msg
         msg = prio + msg
         try:
             if self.unixsocket:

----------------------------------------

The logs will now appear normally. What is happening with the 'codecs' 
condition? Is this controllable through config? Is this a bug in rsyslog? 

Related tickets:

https://bugs.launchpad.net/openstack-common/+bug/1029116
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1029640
http://bugzilla.adiscon.com/show_bug.cgi?id=346

----------
components: IO, Library (Lib), Unicode
messages: 166520
nosy: Aimon.Bustardo, ezio.melotti
priority: normal
severity: normal
status: open
title: UTF8 BOM incorrectly prepended syslog messages when using rsysolog
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15462>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to