----- Original Message -----
> From: "Ayal Baron" <aba...@redhat.com>
> To: "Dan Yasny" <dya...@redhat.com>
> Cc: "Simon Grinberg" <si...@redhat.com>, vdsm-devel@lists.fedorahosted.org
> Sent: Friday, 10 February, 2012 3:51:01 PM
> Subject: Re: [vdsm] flowID schema
> 
> 
> 
> ----- Original Message -----
> > 
> > 
> > ----- Original Message -----
> > > From: "Ayal Baron" <aba...@redhat.com>
> > > To: "Dan Yasny" <dya...@redhat.com>
> > > Cc: "Simon Grinberg" <si...@redhat.com>,
> > > vdsm-devel@lists.fedorahosted.org
> > > Sent: Friday, 10 February, 2012 2:55:46 PM
> > > Subject: Re: [vdsm] flowID schema
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > 
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Ayal Baron" <aba...@redhat.com>
> > > > > To: "Dan Yasny" <dya...@redhat.com>
> > > > > Cc: "Simon Grinberg" <si...@redhat.com>,
> > > > > vdsm-devel@lists.fedorahosted.org
> > > > > Sent: Friday, 10 February, 2012 12:50:04 AM
> > > > > Subject: Re: [vdsm] flowID schema
> > > > > 
> > > > > 
> > > > > 
> > > > > ----- Original Message -----
> > > > > > > From: "Saggi Mizrahi" <smizr...@redhat.com>
> > > > > > > To: "Keith Robertson" <krobe...@redhat.com>
> > > > > > > Cc: "VDSM Project Development"
> > > > > > > <vdsm-devel@lists.fedorahosted.org>
> > > > > > > Sent: Thursday, February 9, 2012 2:24:44 PM
> > > > > > > Subject: Re: [vdsm] flowID schema
> > > > > > >
> > > > > > > -1
> > > > > > >
> > > > > > > I agree that for messaging environment having a Message
> > > > > > > ID
> > > > > > > is
> > > > > > > a
> > > > > > > must
> > > > > > > because you sometimes don't have a particular target so
> > > > > > > when
> > > > > > > you
> > > > > > > get
> > > > > > > a response you need to know what this node is actually
> > > > > > > responding
> > > > > > > to.
> > > > > > >
> > > > > > > The message ID could be composed with <FLOWID><MSGID> so
> > > > > > > you
> > > > > > > can
> > > > > > > reuse the field.
> > > > > > >
> > > > > > > But that is all besides the point.
> > > > > > >
> > > > > > > I understand that someone might find it fun to go on
> > > > > > > following
> > > > > > > the
> > > > > > > entire flow in the Engine and in VDSM. But I would like
> > > > > > > to
> > > > > > > hear
> > > > > > > an
> > > > > > > actual use case where someone would have actually
> > > > > > > benefited
> > > > > > > from
> > > > > > > this.
> > > > > > > As I see it having VSDM return the task ID with every
> > > > > > > response
> > > > > > > (and
> > > > > > > not just for async tasks) is a lot more useful and
> > > > > > > correct.
> > > > > > 
> > > > > > Actually, the only way to understand what happened in a
> > > > > > certain
> > > > > > flow
> > > > > > is to follow it through. From the engine log where an
> > > > > > action
> > > > > > was
> > > > > > initiated, down to the hosts that did the execution.
> > > > > > Everything
> > > > > > RHEV
> > > > > > does is a flow, and with no correlation between hosts
> > > > > > executing
> > > > > > parts of the same flow, troubleshooting turns into
> > > > > > guesswork,
> > > > > > because the only contact point left is time, which is
> > > > > > useless
> > > > > > when
> > > > > > you're talking about vdsm - there are sometimes hundreds of
> > > > > > log
> > > > > > records in a single second, and not every host is in
> > > > > > absolute
> > > > > > sync
> > > > > > with every other.
> > > > > 
> > > > > What are you talking about? you know exactly what operation
> > > > > the
> > > > > engine ran at vdsm level.
> > > > 
> > > > Not always true. Haven't had much chance to deeply dive into
> > > > 3.0
> > > > logs, and if things changed there, it's already a huge step in
> > > > the
> > > > right direction.
> > > 
> > > engine logs now always log calls to vdsm.
> > 
> > finally! :)
> > 
> > > 
> > > > 
> > > > > If it's a task then you also have a task
> > > > > id which is a uuid so you don't need anything else.
> > > > 
> > > > Right, but 1. not everything is a task and
> > > 
> > > Perhaps that should be fixed...
> > > 
> > > > 2. tasks spawn other tasks, and need to be followed through for
> > > > that
> > > 
> > > In vdsm? no they don't
> > 
> > Err, ok, in the engine then. The point being, a flow like
> > MoveMultipleImages (pardon my bad memory for spelling) spawns
> > several tasks, and even within those tasks there's usually more
> > than
> > a single flow.
> > 
> > > 
> > > > 3. long running tasks are hell to debug, because they span
> > > > several
> > > > log files and thousands lines of logs
> > > 
> > > flowid would not solve this in any way.
> > > All you need to do to see the entire task is grep the taskID as
> > > the
> > > thread name is the taskID
> > 
> > which causes me to
> > 1. jump between engine and vdsm logs, comparing task IDs and
> > actions.
> > 2. do it many times in case of a complex flow
> > 
> > Seeing the entire thing on RHEV/oVirt as a system would make a
> > better
> > picture, and being able to grep the entire set of logs for a single
> > ID is a huge plus, since it would produce one large log that
> > (hopefully, and if everything is done right) shows the full flow,
> > step by step, everywhere it took place
> 
> Which is never what you really want as it is pretty useless.  What
> you want is the point of failure and a few hundred lines before.

The problem is - the point of failure is not always the point with the python 
traceback. This is why a flow should be follow-able

>  The rest is just distraction which is the point of this thread.  I
> have never encountered a case where I needed such a flow to debug an
> issue (even when it spread across multiple hosts).

No, the rest is an indication of flow progress, which can help with 
troubleshooting of issues that don't end in an error. Like performance and 
scalability issues


> 
> > 
> > > 
> > > > 
> > > > > In addition, now that engine logs results, you can just grep
> > > > > that
> > > > > instead of a flow id and land at the exact correct command
> > > > > and
> > > > > not
> > > > > have to figure out which out of the 5 run in this flow is the
> > > > > relevant one.
> > > > 
> > > > Haven't seen that yet, but again, what are results? When the
> > > > failure
> > > > is somewhere in the middle of the flow, the resulting failure
> > > > can
> > > > be
> > > > totally irrelevant.
> > > 
> > > When you have the failure message then all you have to do is grep
> > > it
> > > in the log and reach the exact call that failed.  Just search for
> > > "ERROR|FAILED" above that and you reach the place of error.
> > 
> > It's the typical approach to logs, that works for other systems.
> > With
> > RHEV, just finding the traceback is never the solution, because it
> > shows where everything finally went offhill, instead of where the
> > error happened.
> > 
> > > 
> > > > 
> > > > > 
> > > > > If you could give a real example where this would be
> > > > > beneficial
> > > > > (i.e.
> > > > > log excerpts, how you correlated them and how flow id would
> > > > > have
> > > > > eased your job) that would be great.
> > > > 
> > > > Don't have these handy, guess Vladik, who was collecting
> > > > interesting
> > > > fail flows could have helped.
> > > > 
> > > > 
> > > > > Note that I've also discussed this with Yaniv from qe who
> > > > > said
> > > > > they
> > > > > don't really need it.
> > > > 
> > > > I'm not saying I want to see a flow ID as such, what I _am_
> > > > saying
> > > > is
> > > > that flows are important, and we need an easy way of following
> > > > them
> > > > through.
> > > > When a user starts a process in the engine, it should be
> > > > clearly
> > > > logged and marked, then it should consistently report progress
> > > > and
> > > > interim outputs, say which host was picked for what action, and
> > > > how
> > > > that action can be identified in the vdsm logs.
> > > 
> > > That's fine (although periodic logging causes log overflow)
> > 
> > Can't those be aggregated somehow? Or sent into a different log?
> > There was even an idea of maintaining a local db for logging
> > instead
> > of plaintext, sqlite would serve for that purpose
> > 
> > > 
> > > > 
> > > > We cannot rely on timing. We cannot rely on everyone knowing
> > > > obscure
> > > > engine/vdsm action naming conventions, that are not exactly the
> > > > same
> > > > as they look in the GUI. A person with no understanding of
> > > > engine
> > > > and vdsm internals, should be able to easily follow an action
> > > > through to conclusion, and understand what was done at each
> > > > step,
> > > > what the system got in return and how it reacted. And all of
> > > > these
> > > > actions should be clearly inter-related, so a single grep can
> > > > select
> > > > a flow.
> > > > 
> > > > If you prefer to go in the other direction, like turning
> > > > everything
> > > > into a task, that can also work, but each task, when spawning
> > > > another, should very clearly show the relation as well, and
> > > > sets
> > > > of
> > > > tasks should be marked as sets.
> > > > 
> > > > > 
> > > > > > 
> > > > > > >
> > > > > > > A generic debugging scenario as I see it.
> > > > > > >
> > > > > > > 1. Something went wrong
> > > > > > > 2. You go looking in the ENGINE log trying to figure out
> > > > > > > what
> > > > > > > happend.
> > > > > > > 3. You see that ENGINE got SomeError.
> > > > > > 
> > > > > > ok, the rest are all downhill.
> > > > > > 
> > > > > > 4. You follow the failure back to the start of the flow,
> > > > > > then
> > > > > > go
> > > > > > with
> > > > > > the flow to the point where the engine exited to vdsm
> > > > > > 5. switch over to vdsm logs, make sure you have the timing
> > > > > > right
> > > > > > (with no flow ID that's the olny orientation after all)
> > > > > > 6. find the start of the vdsm-side flow, follow it to the
> > > > > > failure,
> > > > > > pray the error makes sense.
> > > > > > 
> > > > > > In many cases the answer is not in the vdsm failure
> > > > > > traceback
> > > > > > but
> > > > > > somewhere in the middle of the flow, with no errors
> > > > > > reported,
> > > > > > this
> > > > > > is why we need a way to easily follow things through.
> > > > > > Moreover,
> > > > > > the
> > > > > > logs should be readable enough to make sense to a typical
> > > > > > sysadmin,
> > > > > > and not a RHEV expert.
> > > > > > 
> > > > > > > 4. Check to see if this error makes sense imagining that
> > > > > > > VDSM
> > > > > > > is
> > > > > > > always right and is a black box.
> > > > > > > 5. You did your digging and now you think that VDSM is as
> > > > > > > fault.
> > > > > > > 6. Go look for the call that failed. (If we returned the
> > > > > > > taskID
> > > > > > > it's
> > > > > > > pretty simple to find that call).
> > > > > > > 7. Look around the call to check VDSM state.
> > > > > > > 8. Profit.
> > > > > > >
> > > > > > > There is never a point where you want to follow a whole
> > > > > > > flow
> > > > > > > call
> > > > > > > by
> > > > > > > call going back and forth, and even if you did having the
> > > > > > > VDSM
> > > > > > > taskID is a better anchor then flowID.
> > > > > > 
> > > > > > not everything is a task, flow IDs would unify entire
> > > > > > flows,
> > > > > > and
> > > > > > make
> > > > > > following them easy.
> > > > > > 
> > > > > > >
> > > > > > > VDSM is built in a way that every call takes in to
> > > > > > > account
> > > > > > > the
> > > > > > > current state only. Debugging it with an engine flow
> > > > > > > mindset
> > > > > > > is
> > > > > > > just
> > > > > > > wrong and distracting. I see it doing more harm the good
> > > > > > > by
> > > > > > > reinforcing bad debugging practices.
> > > > > > 
> > > > > > Maybe you're right, though I can't see how from my
> > > > > > experience
> > > > > > so
> > > > > > far,
> > > > > > but following the flows is the only thing that got cases
> > > > > > resolved.
> > > > > > Not event IDs making every possible error, and not task IDs
> > > > > > (though
> > > > > > these do have their uses) - slow and meticulous mapping of
> > > > > > flows
> > > > > > to
> > > > > > log records.
> > > > > > 
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > > From: "Keith Robertson" <krobe...@redhat.com>
> > > > > > > > To: "VDSM Project Development"
> > > > > > > > <vdsm-devel@lists.fedorahosted.org>
> > > > > > > > Sent: Thursday, February 9, 2012 1:34:43 PM
> > > > > > > > Subject: Re: [vdsm] flowID schema
> > > > > > > >
> > > > > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > > > > > > > >
> > > > > > > > > ----- Original Message -----
> > > > > > > > >> From: "Ayal Baron"<aba...@redhat.com>
> > > > > > > > >> To: "Dan Kenigsberg"<dan...@redhat.com>
> > > > > > > > >> Cc: "VDSM Project
> > > > > > > > >> Development"<vdsm-devel@lists.fedorahosted.org>
> > > > > > > > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > > > > > > > >> Subject: Re: [vdsm] flowID schema
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> ----- Original Message -----
> > > > > > > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi
> > > > > > > > >>> Mizrahi
> > > > > > > > >>> wrote:
> > > > > > > > >>>> flowID makes no sense after the initial API call
> > > > > > > > >>>> as
> > > > > > > > >>>> stuff
> > > > > > > > >>>> like
> > > > > > > > >>>> cacheing\threadpools\samplingtasks\resources\asyncTasks
> > > > > > > > >>>> so
> > > > > > > > >>>> flowing
> > > > > > > > >>>> a flow like that will not give you the entire
> > > > > > > > >>>> picture
> > > > > > > > >>>> while
> > > > > > > > >>>> debugging.
> > > > > > > > >>>>
> > > > > > > > >>>> Also adding it now will make everything even more
> > > > > > > > >>>> ugly.
> > > > > > > > >>>> You know what, just imagine I wrote one of my long
> > > > > > > > >>>> rambles
> > > > > > > > >>>> about
> > > > > > > > >>>> why I don't agree with doing this.
> > > > > > > > >>> I cannot imagine you write anything like that.
> > > > > > > > >>> Really.
> > > > > > > > >>> I
> > > > > > > > >>> do
> > > > > > > > >>> not
> > > > > > > > >>> understand why you object logging flowID on API
> > > > > > > > >>> entry
> > > > > > > > >>> point.
> > > > > > > > >> The question is, what problem is this really trying
> > > > > > > > >> to
> > > > > > > > >> solve
> > > > > > > > >> and
> > > > > > > > >> is
> > > > > > > > >> there a simpler and less obtrusive solution to that
> > > > > > > > >> problem?
> > > > > > > > > correlating logs between ovirt engine and potentially
> > > > > > > > > multiple
> > > > > > > > > vdsm
> > > > > > > > > nodes is a nightmare. It requires a lot skill to
> > > > > > > > > follow
> > > > > > > > > a
> > > > > > > > > transaction through from the front end all the way to
> > > > > > > > > the
> > > > > > > > > node,
> > > > > > > > > and even multiple nodes (eg actions on spm, then
> > > > > > > > > actions
> > > > > > > > > on
> > > > > > > > > other
> > > > > > > > > node to run a vm).
> > > > > > > > > Having a way to correlate the logs and follow a
> > > > > > > > > single
> > > > > > > > > event/flow
> > > > > > > > > is vital.
> > > > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Knowing what command caused a sequence of events in
> > > > > > > > VDSM
> > > > > > > > would
> > > > > > > > be
> > > > > > > > really
> > > > > > > > helpful particularly in a threaded environment.
> > > > > > > >  Further,
> > > > > > > > wouldn't
> > > > > > > > such
> > > > > > > > an ID be helpful in an asynchronous request/response
> > > > > > > > model?
> > > > > > > >  I'm
> > > > > > > > not
> > > > > > > > sure what the plans are for AMQP or even if there are
> > > > > > > > plans,
> > > > > > > > but
> > > > > > > > I'd
> > > > > > > > think that something like this would be crucial for an
> > > > > > > > async
> > > > > > > > response.
> > > > > > > > So, if you implemented it you might be killing 2 birds
> > > > > > > > with
> > > > > > > > 1
> > > > > > > > stone.
> > > > > > > >
> > > > > > > > FYI: If you want to see examples of other systems that
> > > > > > > > use
> > > > > > > > similar
> > > > > > > > concepts, take a look at the correlation ID in JMS.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Keith
> > > > > > > >
> > > > > > > >
> > > > > > > > >>> _______________________________________________
> > > > > > > > >>> vdsm-devel mailing list
> > > > > > > > >>> vdsm-devel@lists.fedorahosted.org
> > > > > > > > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > > > >>>
> > > > > > > > >> _______________________________________________
> > > > > > > > >> vdsm-devel mailing list
> > > > > > > > >> vdsm-devel@lists.fedorahosted.org
> > > > > > > > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > > > >>
> > > > > > > > > _______________________________________________
> > > > > > > > > vdsm-devel mailing list
> > > > > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > vdsm-devel mailing list
> > > > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > > >
> > > > > > > _______________________________________________
> > > > > > > vdsm-devel mailing list
> > > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > >
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Dan Yasny
> > > > > > Red Hat Israel
> > > > > > +972 9769 2280
> > > > > > _______________________________________________
> > > > > > vdsm-devel mailing list
> > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > 
> > > > > 
> > > > 
> > > > --
> > > > 
> > > > 
> > > > 
> > > > Regards,
> > > > 
> > > > Dan Yasny
> > > > Red Hat Israel
> > > > +972 9769 2280
> > > > 
> > > 
> > 
> > --
> > 
> > 
> > 
> > Regards,
> > 
> > Dan Yasny
> > Red Hat Israel
> > +972 9769 2280
> > 
> 

-- 



Regards, 

Dan Yasny 
Red Hat Israel 
+972 9769 2280
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to