[Yahoo-eng-team] [Bug 1718439] Re: Apparent lack of locking in conductor logs

2018-05-10 Thread Ben Nemec
Yeah, since we stopped calling get_logger from oslo.log I don't think we
can fix this in the library.  I'm going to add Nova back to the bug so
they can re-triage base on this information.

** Also affects: nova
   Importance: Undecided
   Status: New

** Changed in: oslo.log
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1718439

Title:
  Apparent lack of locking in conductor logs

Status in OpenStack Compute (nova):
  New
Status in oslo.log:
  Opinion

Bug description:
  I'm looking at conductor logs generated by a customer running RH OSP
  10 (Newton). The logs appear to be corrupt in a manner I'd expect to
  see if 2 processes were writing to the same log file simultaneously.
  For example:

  ===
  2017-09-14 15:54:39.689 120626 ERROR nova.servicegroup.drivers.db return 
self.dbapi.connect(*cargs, **cparams)
  2017-09-14 15:54:39.689 120626 ERROR nova.s2017-09-14 15:54:39.690 120562 
ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service 
status
  2017-09-14 15:54:39.690 120562 ERROR nova.servicegroup.drivers.db Traceback 
(most recent call last):
  ===

  Notice how a new log starts part way through the second line above.
  This also results in log entries in the wrong sort order:

  ===
  2017-09-14 15:54:39.690 120562 ERROR nova.servicegroup.drivers.db return 
self.dbapi.connect(*cargs, **cparams)
  2017-09-14 15:54:39.690 120562 ERROR nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/pymysql/__init__.py", line 88, in Connect
  2017-09-14 15:54:39.689 120626 ERROR nova.servicegroup.drivers.db return 
Connection(*args, **kwargs)
  2017-09-14 15:54:39.689 120626 ERROR nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/pymysql/connections.py", line 657, in __init__
  ===

  Note how the first 2 lines are after the last 2 by timestamp, as
  presumably the last 2 are a continuation of a previous log entry. This
  confounds merge sorting of log files, which is exceptionally useful.

  We also see truncated lines with no timestamp which aren't a
  continuation of the previous line:

  ===
  2017-09-14 15:54:39.690 120607 ERROR nova.servicegroup.drivers.db 
DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to 
MySQL server during query') [SQL: u'SELECT 1']
  2017-09-14 15:54:39.690 120607 ERROR nova.servicegroup.drivers.db 
  elf._execute_and_instances(context)
  ===

  I strongly suspect this is because multiple conductors are running in
  separate processes, and are therefore not benefiting from the thread
  safety of python's logging.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1718439/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1718439] Re: Apparent lack of locking in conductor logs

2017-09-20 Thread Matthew Booth
After some brief discussion in #openstack-nova I've moved this to
oslo.log. The issue here appears to be that we spawn multiple separate
conductor processes writing to the same nova-conductor.log file. We
don't want to stop doing this, as it would break people.

It seems that by default python logging uses thread logs rather than
external locks:

  https://docs.python.org/2/library/multiprocessing.html#logging

Suggest the fix might be to explicitly use multiprocessing.get_logger(),
or at least provide an option to do this when we know it's required.

** Project changed: nova => oslo.log

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1718439

Title:
  Apparent lack of locking in conductor logs

Status in oslo.log:
  New

Bug description:
  I'm looking at conductor logs generated by a customer running RH OSP
  10 (Newton). The logs appear to be corrupt in a manner I'd expect to
  see if 2 processes were writing to the same log file simultaneously.
  For example:

  ===
  2017-09-14 15:54:39.689 120626 ERROR nova.servicegroup.drivers.db return 
self.dbapi.connect(*cargs, **cparams)
  2017-09-14 15:54:39.689 120626 ERROR nova.s2017-09-14 15:54:39.690 120562 
ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service 
status
  2017-09-14 15:54:39.690 120562 ERROR nova.servicegroup.drivers.db Traceback 
(most recent call last):
  ===

  Notice how a new log starts part way through the second line above.
  This also results in log entries in the wrong sort order:

  ===
  2017-09-14 15:54:39.690 120562 ERROR nova.servicegroup.drivers.db return 
self.dbapi.connect(*cargs, **cparams)
  2017-09-14 15:54:39.690 120562 ERROR nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/pymysql/__init__.py", line 88, in Connect
  2017-09-14 15:54:39.689 120626 ERROR nova.servicegroup.drivers.db return 
Connection(*args, **kwargs)
  2017-09-14 15:54:39.689 120626 ERROR nova.servicegroup.drivers.db   File 
"/usr/lib/python2.7/site-packages/pymysql/connections.py", line 657, in __init__
  ===

  Note how the first 2 lines are after the last 2 by timestamp, as
  presumably the last 2 are a continuation of a previous log entry. This
  confounds merge sorting of log files, which is exceptionally useful.

  We also see truncated lines with no timestamp which aren't a
  continuation of the previous line:

  ===
  2017-09-14 15:54:39.690 120607 ERROR nova.servicegroup.drivers.db 
DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to 
MySQL server during query') [SQL: u'SELECT 1']
  2017-09-14 15:54:39.690 120607 ERROR nova.servicegroup.drivers.db 
  elf._execute_and_instances(context)
  ===

  I strongly suspect this is because multiple conductors are running in
  separate processes, and are therefore not benefiting from the thread
  safety of python's logging.

To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.log/+bug/1718439/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp