----- Original Message -----
> 
> 
> ----- Original Message -----
> > From: "Ayal Baron" <aba...@redhat.com>
> > To: "Dan Yasny" <dya...@redhat.com>
> > Cc: "Simon Grinberg" <si...@redhat.com>,
> > vdsm-devel@lists.fedorahosted.org
> > Sent: Friday, 10 February, 2012 12:50:04 AM
> > Subject: Re: [vdsm] flowID schema
> > 
> > 
> > 
> > ----- Original Message -----
> > > > From: "Saggi Mizrahi" <smizr...@redhat.com>
> > > > To: "Keith Robertson" <krobe...@redhat.com>
> > > > Cc: "VDSM Project Development"
> > > > <vdsm-devel@lists.fedorahosted.org>
> > > > Sent: Thursday, February 9, 2012 2:24:44 PM
> > > > Subject: Re: [vdsm] flowID schema
> > > >
> > > > -1
> > > >
> > > > I agree that for messaging environment having a Message ID is a
> > > > must
> > > > because you sometimes don't have a particular target so when
> > > > you
> > > > get
> > > > a response you need to know what this node is actually
> > > > responding
> > > > to.
> > > >
> > > > The message ID could be composed with <FLOWID><MSGID> so you
> > > > can
> > > > reuse the field.
> > > >
> > > > But that is all besides the point.
> > > >
> > > > I understand that someone might find it fun to go on following
> > > > the
> > > > entire flow in the Engine and in VDSM. But I would like to hear
> > > > an
> > > > actual use case where someone would have actually benefited
> > > > from
> > > > this.
> > > > As I see it having VSDM return the task ID with every response
> > > > (and
> > > > not just for async tasks) is a lot more useful and correct.
> > > 
> > > Actually, the only way to understand what happened in a certain
> > > flow
> > > is to follow it through. From the engine log where an action was
> > > initiated, down to the hosts that did the execution. Everything
> > > RHEV
> > > does is a flow, and with no correlation between hosts executing
> > > parts of the same flow, troubleshooting turns into guesswork,
> > > because the only contact point left is time, which is useless
> > > when
> > > you're talking about vdsm - there are sometimes hundreds of log
> > > records in a single second, and not every host is in absolute
> > > sync
> > > with every other.
> > 
> > What are you talking about? you know exactly what operation the
> > engine ran at vdsm level.
> 
> Not always true. Haven't had much chance to deeply dive into 3.0
> logs, and if things changed there, it's already a huge step in the
> right direction.

engine logs now always log calls to vdsm.

> 
> > If it's a task then you also have a task
> > id which is a uuid so you don't need anything else.
> 
> Right, but 1. not everything is a task and

Perhaps that should be fixed...

> 2. tasks spawn other tasks, and need to be followed through for that

In vdsm? no they don't

> 3. long running tasks are hell to debug, because they span several
> log files and thousands lines of logs

flowid would not solve this in any way.
All you need to do to see the entire task is grep the taskID as the thread name 
is the taskID

> 
> > In addition, now that engine logs results, you can just grep that
> > instead of a flow id and land at the exact correct command and not
> > have to figure out which out of the 5 run in this flow is the
> > relevant one.
> 
> Haven't seen that yet, but again, what are results? When the failure
> is somewhere in the middle of the flow, the resulting failure can be
> totally irrelevant.

When you have the failure message then all you have to do is grep it in the log 
and reach the exact call that failed.  Just search for "ERROR|FAILED" above 
that and you reach the place of error.

> 
> > 
> > If you could give a real example where this would be beneficial
> > (i.e.
> > log excerpts, how you correlated them and how flow id would have
> > eased your job) that would be great.
> 
> Don't have these handy, guess Vladik, who was collecting interesting
> fail flows could have helped.
> 
> 
> > Note that I've also discussed this with Yaniv from qe who said they
> > don't really need it.
> 
> I'm not saying I want to see a flow ID as such, what I _am_ saying is
> that flows are important, and we need an easy way of following them
> through.
> When a user starts a process in the engine, it should be clearly
> logged and marked, then it should consistently report progress and
> interim outputs, say which host was picked for what action, and how
> that action can be identified in the vdsm logs.

That's fine (although periodic logging causes log overflow)

> 
> We cannot rely on timing. We cannot rely on everyone knowing obscure
> engine/vdsm action naming conventions, that are not exactly the same
> as they look in the GUI. A person with no understanding of engine
> and vdsm internals, should be able to easily follow an action
> through to conclusion, and understand what was done at each step,
> what the system got in return and how it reacted. And all of these
> actions should be clearly inter-related, so a single grep can select
> a flow.
> 
> If you prefer to go in the other direction, like turning everything
> into a task, that can also work, but each task, when spawning
> another, should very clearly show the relation as well, and sets of
> tasks should be marked as sets.
> 
> > 
> > > 
> > > >
> > > > A generic debugging scenario as I see it.
> > > >
> > > > 1. Something went wrong
> > > > 2. You go looking in the ENGINE log trying to figure out what
> > > > happend.
> > > > 3. You see that ENGINE got SomeError.
> > > 
> > > ok, the rest are all downhill.
> > > 
> > > 4. You follow the failure back to the start of the flow, then go
> > > with
> > > the flow to the point where the engine exited to vdsm
> > > 5. switch over to vdsm logs, make sure you have the timing right
> > > (with no flow ID that's the olny orientation after all)
> > > 6. find the start of the vdsm-side flow, follow it to the
> > > failure,
> > > pray the error makes sense.
> > > 
> > > In many cases the answer is not in the vdsm failure traceback but
> > > somewhere in the middle of the flow, with no errors reported,
> > > this
> > > is why we need a way to easily follow things through. Moreover,
> > > the
> > > logs should be readable enough to make sense to a typical
> > > sysadmin,
> > > and not a RHEV expert.
> > > 
> > > > 4. Check to see if this error makes sense imagining that VDSM
> > > > is
> > > > always right and is a black box.
> > > > 5. You did your digging and now you think that VDSM is as
> > > > fault.
> > > > 6. Go look for the call that failed. (If we returned the taskID
> > > > it's
> > > > pretty simple to find that call).
> > > > 7. Look around the call to check VDSM state.
> > > > 8. Profit.
> > > >
> > > > There is never a point where you want to follow a whole flow
> > > > call
> > > > by
> > > > call going back and forth, and even if you did having the VDSM
> > > > taskID is a better anchor then flowID.
> > > 
> > > not everything is a task, flow IDs would unify entire flows, and
> > > make
> > > following them easy.
> > > 
> > > >
> > > > VDSM is built in a way that every call takes in to account the
> > > > current state only. Debugging it with an engine flow mindset is
> > > > just
> > > > wrong and distracting. I see it doing more harm the good by
> > > > reinforcing bad debugging practices.
> > > 
> > > Maybe you're right, though I can't see how from my experience so
> > > far,
> > > but following the flows is the only thing that got cases
> > > resolved.
> > > Not event IDs making every possible error, and not task IDs
> > > (though
> > > these do have their uses) - slow and meticulous mapping of flows
> > > to
> > > log records.
> > > 
> > > >
> > > > ----- Original Message -----
> > > > > From: "Keith Robertson" <krobe...@redhat.com>
> > > > > To: "VDSM Project Development"
> > > > > <vdsm-devel@lists.fedorahosted.org>
> > > > > Sent: Thursday, February 9, 2012 1:34:43 PM
> > > > > Subject: Re: [vdsm] flowID schema
> > > > >
> > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > > > > >
> > > > > > ----- Original Message -----
> > > > > >> From: "Ayal Baron"<aba...@redhat.com>
> > > > > >> To: "Dan Kenigsberg"<dan...@redhat.com>
> > > > > >> Cc: "VDSM Project
> > > > > >> Development"<vdsm-devel@lists.fedorahosted.org>
> > > > > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > > > > >> Subject: Re: [vdsm] flowID schema
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> ----- Original Message -----
> > > > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi
> > > > > >>> wrote:
> > > > > >>>> flowID makes no sense after the initial API call as
> > > > > >>>> stuff
> > > > > >>>> like
> > > > > >>>> cacheing\threadpools\samplingtasks\resources\asyncTasks
> > > > > >>>> so
> > > > > >>>> flowing
> > > > > >>>> a flow like that will not give you the entire picture
> > > > > >>>> while
> > > > > >>>> debugging.
> > > > > >>>>
> > > > > >>>> Also adding it now will make everything even more ugly.
> > > > > >>>> You know what, just imagine I wrote one of my long
> > > > > >>>> rambles
> > > > > >>>> about
> > > > > >>>> why I don't agree with doing this.
> > > > > >>> I cannot imagine you write anything like that. Really. I
> > > > > >>> do
> > > > > >>> not
> > > > > >>> understand why you object logging flowID on API entry
> > > > > >>> point.
> > > > > >> The question is, what problem is this really trying to
> > > > > >> solve
> > > > > >> and
> > > > > >> is
> > > > > >> there a simpler and less obtrusive solution to that
> > > > > >> problem?
> > > > > > correlating logs between ovirt engine and potentially
> > > > > > multiple
> > > > > > vdsm
> > > > > > nodes is a nightmare. It requires a lot skill to follow a
> > > > > > transaction through from the front end all the way to the
> > > > > > node,
> > > > > > and even multiple nodes (eg actions on spm, then actions on
> > > > > > other
> > > > > > node to run a vm).
> > > > > > Having a way to correlate the logs and follow a single
> > > > > > event/flow
> > > > > > is vital.
> > > > > >
> > > > > +1
> > > > >
> > > > > Knowing what command caused a sequence of events in VDSM
> > > > > would
> > > > > be
> > > > > really
> > > > > helpful particularly in a threaded environment.  Further,
> > > > > wouldn't
> > > > > such
> > > > > an ID be helpful in an asynchronous request/response model?
> > > > >  I'm
> > > > > not
> > > > > sure what the plans are for AMQP or even if there are plans,
> > > > > but
> > > > > I'd
> > > > > think that something like this would be crucial for an async
> > > > > response.
> > > > > So, if you implemented it you might be killing 2 birds with 1
> > > > > stone.
> > > > >
> > > > > FYI: If you want to see examples of other systems that use
> > > > > similar
> > > > > concepts, take a look at the correlation ID in JMS.
> > > > >
> > > > > Cheers,
> > > > > Keith
> > > > >
> > > > >
> > > > > >>> _______________________________________________
> > > > > >>> vdsm-devel mailing list
> > > > > >>> vdsm-devel@lists.fedorahosted.org
> > > > > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > >>>
> > > > > >> _______________________________________________
> > > > > >> vdsm-devel mailing list
> > > > > >> vdsm-devel@lists.fedorahosted.org
> > > > > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > >>
> > > > > > _______________________________________________
> > > > > > vdsm-devel mailing list
> > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > >
> > > > > _______________________________________________
> > > > > vdsm-devel mailing list
> > > > > vdsm-devel@lists.fedorahosted.org
> > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > >
> > > > _______________________________________________
> > > > vdsm-devel mailing list
> > > > vdsm-devel@lists.fedorahosted.org
> > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > >
> > > 
> > > 
> > > --
> > > 
> > > 
> > > 
> > > Regards,
> > > 
> > > Dan Yasny
> > > Red Hat Israel
> > > +972 9769 2280
> > > _______________________________________________
> > > vdsm-devel mailing list
> > > vdsm-devel@lists.fedorahosted.org
> > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > 
> > 
> 
> --
> 
> 
> 
> Regards,
> 
> Dan Yasny
> Red Hat Israel
> +972 9769 2280
> 
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to