Public bug reported:

Ubuntu 12.0.4 LTS 64bit
python2.7-minimal                  2.7.3-0ubuntu3
rsyslog                            5.8.6-1ubuntu8

Python converts all syslog messages to UTF8 before sending to syslog. It
also prepends  the Byte Order Mark (BOM) of the Unicode Standard.  This
prepended BOM causes bad characters when using rsyslog (have not
verified with std syslog or syslog-ng).

Example log line:

Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi
[req-48a555a5-6d2a-4a38-8384-3b4684357e72
19f932a5b0b34655989f4cb761522bb3 2617e657fdf84569a6be7977318e46c8]
http://MASKED:8774/v1.1/2617e657fdf84569a6be7977318e46c8/os-
hosts/MASKED.json?ignore_awful_caching1343248563 returned with HTTP 200

Note the ' ' before the date field.

Interesting find on issues from another site:

"Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard.
Specifically it is the hex bytes EF BB BF, which form the UTF-8
representation of the BOM, misinterpreted as ISO 8859/1 text instead of
UTF-8.

Probably what it means is that you are using a text editor that is
saving files in UTF-8 with the BOM, when it should be saving without the
BOM. It could be PHP files that have the BOM, in which case they'd
appear as literal text on your page. Or it could be translated text you
pasted into Joomla! edit windows.

The Unicode Consortium's FAQ on the Byte Order Mark is at
http://www.unicode.org/faq/utf_bom.html#BOM ."

Note that if I edit the file:  /usr/lib/python2.7/logging/handlers.py as shown 
in this patch, the bad characters go away:
----------------------------------------------------------
@@ -797,9 +797,10 @@
                                             self.mapPriority(record.levelname))
         # Message is a string. Convert to bytes as required by RFC 5424
         if type(msg) is unicode:
+ # Morph
            msg = msg.encode('utf-8')
- if codecs:
- msg = codecs.BOM_UTF8 + msg
+ #if codecs:
+ # msg = codecs.BOM_UTF8 + msg
         msg = prio + msg
         try:
             if self.unixsocket:

----------------------------------------------------

Perhaps something is wrong with the 'codecs' condition??

** Affects: python2.7 (Ubuntu)
     Importance: Undecided
         Status: New

** Description changed:

  Ubuntu 12.0.4 LTS 64bit
+ python2.7-minimal                  2.7.3-0ubuntu3      
+ rsyslog                            5.8.6-1ubuntu8
  
  Python converts all syslog messages to UTF8 before sending to syslog. It
  also prepends  the Byte Order Mark (BOM) of the Unicode Standard.  This
  prepended BOM causes bad characters when using rsyslog (have not
  verified with std syslog or syslog-ng).
- 
  
  Example log line:
  
  Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi
  [req-48a555a5-6d2a-4a38-8384-3b4684357e72
  19f932a5b0b34655989f4cb761522bb3 2617e657fdf84569a6be7977318e46c8]
  http://mc.la-1-11.morphlabs.net:8774/v1.1/2617e657fdf84569a6be7977318e46c8
  /os-
  hosts/cn32.la-1-11.morphcloud.net.json?ignore_awful_caching1343248563
  returned with HTTP 200
  
  Note the ' ' before the date field.
  
  Interesting find on issues from another site:
  
  "Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard.
  Specifically it is the hex bytes EF BB BF, which form the UTF-8
  representation of the BOM, misinterpreted as ISO 8859/1 text instead of
  UTF-8.
  
  Probably what it means is that you are using a text editor that is
  saving files in UTF-8 with the BOM, when it should be saving without the
  BOM. It could be PHP files that have the BOM, in which case they'd
  appear as literal text on your page. Or it could be translated text you
  pasted into Joomla! edit windows.
  
  The Unicode Consortium's FAQ on the Byte Order Mark is at
  http://www.unicode.org/faq/utf_bom.html#BOM ."
  
- 
  Note that if I edit the file:  /usr/lib/python2.7/logging/handlers.py as 
shown in this patch, the bad characters go away:
  ----------------------------------------------------------
  @@ -797,9 +797,10 @@
-                                              
self.mapPriority(record.levelname))
-          # Message is a string. Convert to bytes as required by RFC 5424
-          if type(msg) is unicode:
+                                              
self.mapPriority(record.levelname))
+          # Message is a string. Convert to bytes as required by RFC 5424
+          if type(msg) is unicode:
  + # Morph
-             msg = msg.encode('utf-8')
+             msg = msg.encode('utf-8')
  - if codecs:
  - msg = codecs.BOM_UTF8 + msg
  + #if codecs:
  + # msg = codecs.BOM_UTF8 + msg
-          msg = prio + msg
-          try:
-              if self.unixsocket:
+          msg = prio + msg
+          try:
+              if self.unixsocket:
  
  ----------------------------------------------------
  
- 
  Perhaps something is wrong with the 'codec' condition??

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1029640

Title:
  Bad characters in Python logger output when using rsyslog

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1029640/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to