> > 
> > The approach to this thing has been wrong from the word go.
> > Someone suggested a *solution* (flowid) without stating the
> > problem.
> > What Saggi and I are trying to figure out is the problem and from
> > there devise a proper solution.
> > Because the problem has not been well defined (real use case with
> > real logs to back it up) then this discussion is up in the air.
> > We can argue about it till kingdom come but it's all theoretical
> > and
> > irrelevant.
> > The fact of the matter is though that adding more IDs will clutter
> > the log and that this is also adding code so it's more places for
> > bugs, so there had better be a good reason for it otherwise we're
> > making things more difficult to debug, not easier.
> > The case where you had to spend 30m finding the right log is simply
> > because you did not have enough info in the engine log before.  Now
> > that we do have this info (every call to vdsm and every result),
> > there is absolutely no reason for you to waste so much time.
> > 
> > Wrt end users debugging flows - end users should get all the
> > information they need in the engine log, the second they have to
> > reach the host we've failed.  Improving the logs on the hosts for
> > the sake of the end users is like putting a band aid on a severed
> > limb.
> Unfortunately, this almost never happens. with 3.0 things improved,
> but debugging an ovirt issue is still a complicated business. If you
> want to define a problem and start working on the solution - this is
> it. The ultimate goal is to provide logging, that are
> 1. Uncluttered

FlowId does the opposite

> 2. easy to read and understand

Adding more (flowID) to each line makes things less legible.

> 3. do not require special training (you and Saggi know every verb,
> and you know what to look for. Imagine for a split second, that
> someone is not this vdsm-savvy)

FlowID does absolutely nothing towards that end.  What is required for that 
(and should indeed be done) is work on improving the existing logs:
1. reduce verbosity
2. make sure that required points in the flow are logged
3. improve log messages

Don't forget the MSGID change that Keith promised to push as well.

> 4. FAST! (And by this I mean I am able to pinpoint the exact point in

Simple matter of grepping the message that was sent to engine (you anyway start 
from the engine log, otherwise you wouldn't have the 'flowID')

> the log where things happened, or failed). Think of this in terms of
> kernel messages log - something goes wrong - you just look for the
> word "ERROR" and possibly something else (hinted in the GUI/API) in
> the log, and that puts you right at the spot. Can we somehow do the
> same? Can we, under the given conditions (lots of different entities
> cluttering the log with very verbose output; complex flows that for
> every action would prepare, check sanity, execute, rollback, finish,
> report; constant polling \hello repostat!\ that also adds to the
> clutter; etc etc) provide easily readable and quickly searchable
> logs, that a rookie can get through?
> 5. Easily follow-able (yes, I do want to be able to see exactly what
> the system does when I click a button in the GUI, flowID or whatever
> else, but it's something people do to understand and learn the
> system, think of additional scenarios, and to avoid shooting

Which is why engine logs now contain every call to vdsm and return value 
(except when return value is too long in which case I hope what they do is 
simply truncate it)


> This is what I would want to see. I am still convinced a flowID could
> provide at least a partial solution to all of the above, one way or
> another, but if you can provide a better solution, one that will
> adhere to #3, I'd love to hear of it

So let's bring this conversation down to earth.  Please provide example logs 
with a real problem and let's see what flowID would do to these and what out of 
the above would be solved.


vdsm-devel mailing list

Reply via email to