> From: "Saggi Mizrahi" <smizr...@redhat.com>
> To: "Keith Robertson" <krobe...@redhat.com>
> Cc: "VDSM Project Development" <vdsm-devel@lists.fedorahosted.org>
> Sent: Thursday, February 9, 2012 2:24:44 PM
> Subject: Re: [vdsm] flowID schema
>
> -1
>
> I agree that for messaging environment having a Message ID is a must
> because you sometimes don't have a particular target so when you get
> a response you need to know what this node is actually responding
> to.
>
> The message ID could be composed with <FLOWID><MSGID> so you can
> reuse the field.
>
> But that is all besides the point.
>
> I understand that someone might find it fun to go on following the
> entire flow in the Engine and in VDSM. But I would like to hear an
> actual use case where someone would have actually benefited from
> this.
> As I see it having VSDM return the task ID with every response (and
> not just for async tasks) is a lot more useful and correct.

Actually, the only way to understand what happened in a certain flow is to 
follow it through. From the engine log where an action was initiated, down to 
the hosts that did the execution. Everything RHEV does is a flow, and with no 
correlation between hosts executing parts of the same flow, troubleshooting 
turns into guesswork, because the only contact point left is time, which is 
useless when you're talking about vdsm - there are sometimes hundreds of log 
records in a single second, and not every host is in absolute sync with every 
other.

>
> A generic debugging scenario as I see it.
>
> 1. Something went wrong
> 2. You go looking in the ENGINE log trying to figure out what
> happend.
> 3. You see that ENGINE got SomeError.

ok, the rest are all downhill. 

4. You follow the failure back to the start of the flow, then go with the flow 
to the point where the engine exited to vdsm 
5. switch over to vdsm logs, make sure you have the timing right (with no flow 
ID that's the olny orientation after all)
6. find the start of the vdsm-side flow, follow it to the failure, pray the 
error makes sense.

In many cases the answer is not in the vdsm failure traceback but somewhere in 
the middle of the flow, with no errors reported, this is why we need a way to 
easily follow things through. Moreover, the logs should be readable enough to 
make sense to a typical sysadmin, and not a RHEV expert.

> 4. Check to see if this error makes sense imagining that VDSM is
> always right and is a black box.
> 5. You did your digging and now you think that VDSM is as fault.
> 6. Go look for the call that failed. (If we returned the taskID it's
> pretty simple to find that call).
> 7. Look around the call to check VDSM state.
> 8. Profit.
>
> There is never a point where you want to follow a whole flow call by
> call going back and forth, and even if you did having the VDSM
> taskID is a better anchor then flowID.

not everything is a task, flow IDs would unify entire flows, and make following 
them easy.

>
> VDSM is built in a way that every call takes in to account the
> current state only. Debugging it with an engine flow mindset is just
> wrong and distracting. I see it doing more harm the good by
> reinforcing bad debugging practices.

Maybe you're right, though I can't see how from my experience so far, but 
following the flows is the only thing that got cases resolved. Not event IDs 
making every possible error, and not task IDs (though these do have their uses) 
- slow and meticulous mapping of flows to log records.

>
> ----- Original Message -----
> > From: "Keith Robertson" <krobe...@redhat.com>
> > To: "VDSM Project Development" <vdsm-devel@lists.fedorahosted.org>
> > Sent: Thursday, February 9, 2012 1:34:43 PM
> > Subject: Re: [vdsm] flowID schema
> >
> > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > >
> > > ----- Original Message -----
> > >> From: "Ayal Baron"<aba...@redhat.com>
> > >> To: "Dan Kenigsberg"<dan...@redhat.com>
> > >> Cc: "VDSM Project
> > >> Development"<vdsm-devel@lists.fedorahosted.org>
> > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > >> Subject: Re: [vdsm] flowID schema
> > >>
> > >>
> > >>
> > >> ----- Original Message -----
> > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:
> > >>>> flowID makes no sense after the initial API call as stuff like
> > >>>> cacheing\threadpools\samplingtasks\resources\asyncTasks so
> > >>>> flowing
> > >>>> a flow like that will not give you the entire picture while
> > >>>> debugging.
> > >>>>
> > >>>> Also adding it now will make everything even more ugly.
> > >>>> You know what, just imagine I wrote one of my long rambles
> > >>>> about
> > >>>> why I don't agree with doing this.
> > >>> I cannot imagine you write anything like that. Really. I do not
> > >>> understand why you object logging flowID on API entry point.
> > >> The question is, what problem is this really trying to solve and
> > >> is
> > >> there a simpler and less obtrusive solution to that problem?
> > > correlating logs between ovirt engine and potentially multiple
> > > vdsm
> > > nodes is a nightmare. It requires a lot skill to follow a
> > > transaction through from the front end all the way to the node,
> > > and even multiple nodes (eg actions on spm, then actions on other
> > > node to run a vm).
> > > Having a way to correlate the logs and follow a single event/flow
> > > is vital.
> > >
> > +1
> >
> > Knowing what command caused a sequence of events in VDSM would be
> > really
> > helpful particularly in a threaded environment.  Further, wouldn't
> > such
> > an ID be helpful in an asynchronous request/response model?  I'm
> > not
> > sure what the plans are for AMQP or even if there are plans, but
> > I'd
> > think that something like this would be crucial for an async
> > response.
> > So, if you implemented it you might be killing 2 birds with 1
> > stone.
> >
> > FYI: If you want to see examples of other systems that use similar
> > concepts, take a look at the correlation ID in JMS.
> >
> > Cheers,
> > Keith
> >
> >
> > >>> _______________________________________________
> > >>> vdsm-devel mailing list
> > >>> vdsm-devel@lists.fedorahosted.org
> > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > >>>
> > >> _______________________________________________
> > >> vdsm-devel mailing list
> > >> vdsm-devel@lists.fedorahosted.org
> > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > >>
> > > _______________________________________________
> > > vdsm-devel mailing list
> > > vdsm-devel@lists.fedorahosted.org
> > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> >
> > _______________________________________________
> > vdsm-devel mailing list
> > vdsm-devel@lists.fedorahosted.org
> > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> >
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://fedorahosted.org/mailman/listinfo/vdsm-devel
>


-- 



Regards, 

Dan Yasny 
Red Hat Israel 
+972 9769 2280
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to