----- Original Message -----
> From: "Ayal Baron" <aba...@redhat.com>
> To: "Dan Yasny" <dya...@redhat.com>
> Cc: "Simon Grinberg" <si...@redhat.com>, vdsm-devel@lists.fedorahosted.org
> Sent: Friday, 10 February, 2012 2:55:46 PM
> Subject: Re: [vdsm] flowID schema
> 
> 
> 
> ----- Original Message -----
> > 
> > 
> > ----- Original Message -----
> > > From: "Ayal Baron" <aba...@redhat.com>
> > > To: "Dan Yasny" <dya...@redhat.com>
> > > Cc: "Simon Grinberg" <si...@redhat.com>,
> > > vdsm-devel@lists.fedorahosted.org
> > > Sent: Friday, 10 February, 2012 12:50:04 AM
> > > Subject: Re: [vdsm] flowID schema
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > > From: "Saggi Mizrahi" <smizr...@redhat.com>
> > > > > To: "Keith Robertson" <krobe...@redhat.com>
> > > > > Cc: "VDSM Project Development"
> > > > > <vdsm-devel@lists.fedorahosted.org>
> > > > > Sent: Thursday, February 9, 2012 2:24:44 PM
> > > > > Subject: Re: [vdsm] flowID schema
> > > > >
> > > > > -1
> > > > >
> > > > > I agree that for messaging environment having a Message ID is
> > > > > a
> > > > > must
> > > > > because you sometimes don't have a particular target so when
> > > > > you
> > > > > get
> > > > > a response you need to know what this node is actually
> > > > > responding
> > > > > to.
> > > > >
> > > > > The message ID could be composed with <FLOWID><MSGID> so you
> > > > > can
> > > > > reuse the field.
> > > > >
> > > > > But that is all besides the point.
> > > > >
> > > > > I understand that someone might find it fun to go on
> > > > > following
> > > > > the
> > > > > entire flow in the Engine and in VDSM. But I would like to
> > > > > hear
> > > > > an
> > > > > actual use case where someone would have actually benefited
> > > > > from
> > > > > this.
> > > > > As I see it having VSDM return the task ID with every
> > > > > response
> > > > > (and
> > > > > not just for async tasks) is a lot more useful and correct.
> > > > 
> > > > Actually, the only way to understand what happened in a certain
> > > > flow
> > > > is to follow it through. From the engine log where an action
> > > > was
> > > > initiated, down to the hosts that did the execution. Everything
> > > > RHEV
> > > > does is a flow, and with no correlation between hosts executing
> > > > parts of the same flow, troubleshooting turns into guesswork,
> > > > because the only contact point left is time, which is useless
> > > > when
> > > > you're talking about vdsm - there are sometimes hundreds of log
> > > > records in a single second, and not every host is in absolute
> > > > sync
> > > > with every other.
> > > 
> > > What are you talking about? you know exactly what operation the
> > > engine ran at vdsm level.
> > 
> > Not always true. Haven't had much chance to deeply dive into 3.0
> > logs, and if things changed there, it's already a huge step in the
> > right direction.
> 
> engine logs now always log calls to vdsm.

finally! :)

> 
> > 
> > > If it's a task then you also have a task
> > > id which is a uuid so you don't need anything else.
> > 
> > Right, but 1. not everything is a task and
> 
> Perhaps that should be fixed...
> 
> > 2. tasks spawn other tasks, and need to be followed through for
> > that
> 
> In vdsm? no they don't

Err, ok, in the engine then. The point being, a flow like MoveMultipleImages 
(pardon my bad memory for spelling) spawns several tasks, and even within those 
tasks there's usually more than a single flow.

> 
> > 3. long running tasks are hell to debug, because they span several
> > log files and thousands lines of logs
> 
> flowid would not solve this in any way.
> All you need to do to see the entire task is grep the taskID as the
> thread name is the taskID

which causes me to
1. jump between engine and vdsm logs, comparing task IDs and actions. 
2. do it many times in case of a complex flow

Seeing the entire thing on RHEV/oVirt as a system would make a better picture, 
and being able to grep the entire set of logs for a single ID is a huge plus, 
since it would produce one large log that (hopefully, and if everything is done 
right) shows the full flow, step by step, everywhere it took place

> 
> > 
> > > In addition, now that engine logs results, you can just grep that
> > > instead of a flow id and land at the exact correct command and
> > > not
> > > have to figure out which out of the 5 run in this flow is the
> > > relevant one.
> > 
> > Haven't seen that yet, but again, what are results? When the
> > failure
> > is somewhere in the middle of the flow, the resulting failure can
> > be
> > totally irrelevant.
> 
> When you have the failure message then all you have to do is grep it
> in the log and reach the exact call that failed.  Just search for
> "ERROR|FAILED" above that and you reach the place of error.

It's the typical approach to logs, that works for other systems. With RHEV, 
just finding the traceback is never the solution, because it shows where 
everything finally went offhill, instead of where the error happened.

> 
> > 
> > > 
> > > If you could give a real example where this would be beneficial
> > > (i.e.
> > > log excerpts, how you correlated them and how flow id would have
> > > eased your job) that would be great.
> > 
> > Don't have these handy, guess Vladik, who was collecting
> > interesting
> > fail flows could have helped.
> > 
> > 
> > > Note that I've also discussed this with Yaniv from qe who said
> > > they
> > > don't really need it.
> > 
> > I'm not saying I want to see a flow ID as such, what I _am_ saying
> > is
> > that flows are important, and we need an easy way of following them
> > through.
> > When a user starts a process in the engine, it should be clearly
> > logged and marked, then it should consistently report progress and
> > interim outputs, say which host was picked for what action, and how
> > that action can be identified in the vdsm logs.
> 
> That's fine (although periodic logging causes log overflow)

Can't those be aggregated somehow? Or sent into a different log? There was even 
an idea of maintaining a local db for logging instead of plaintext, sqlite 
would serve for that purpose

> 
> > 
> > We cannot rely on timing. We cannot rely on everyone knowing
> > obscure
> > engine/vdsm action naming conventions, that are not exactly the
> > same
> > as they look in the GUI. A person with no understanding of engine
> > and vdsm internals, should be able to easily follow an action
> > through to conclusion, and understand what was done at each step,
> > what the system got in return and how it reacted. And all of these
> > actions should be clearly inter-related, so a single grep can
> > select
> > a flow.
> > 
> > If you prefer to go in the other direction, like turning everything
> > into a task, that can also work, but each task, when spawning
> > another, should very clearly show the relation as well, and sets of
> > tasks should be marked as sets.
> > 
> > > 
> > > > 
> > > > >
> > > > > A generic debugging scenario as I see it.
> > > > >
> > > > > 1. Something went wrong
> > > > > 2. You go looking in the ENGINE log trying to figure out what
> > > > > happend.
> > > > > 3. You see that ENGINE got SomeError.
> > > > 
> > > > ok, the rest are all downhill.
> > > > 
> > > > 4. You follow the failure back to the start of the flow, then
> > > > go
> > > > with
> > > > the flow to the point where the engine exited to vdsm
> > > > 5. switch over to vdsm logs, make sure you have the timing
> > > > right
> > > > (with no flow ID that's the olny orientation after all)
> > > > 6. find the start of the vdsm-side flow, follow it to the
> > > > failure,
> > > > pray the error makes sense.
> > > > 
> > > > In many cases the answer is not in the vdsm failure traceback
> > > > but
> > > > somewhere in the middle of the flow, with no errors reported,
> > > > this
> > > > is why we need a way to easily follow things through. Moreover,
> > > > the
> > > > logs should be readable enough to make sense to a typical
> > > > sysadmin,
> > > > and not a RHEV expert.
> > > > 
> > > > > 4. Check to see if this error makes sense imagining that VDSM
> > > > > is
> > > > > always right and is a black box.
> > > > > 5. You did your digging and now you think that VDSM is as
> > > > > fault.
> > > > > 6. Go look for the call that failed. (If we returned the
> > > > > taskID
> > > > > it's
> > > > > pretty simple to find that call).
> > > > > 7. Look around the call to check VDSM state.
> > > > > 8. Profit.
> > > > >
> > > > > There is never a point where you want to follow a whole flow
> > > > > call
> > > > > by
> > > > > call going back and forth, and even if you did having the
> > > > > VDSM
> > > > > taskID is a better anchor then flowID.
> > > > 
> > > > not everything is a task, flow IDs would unify entire flows,
> > > > and
> > > > make
> > > > following them easy.
> > > > 
> > > > >
> > > > > VDSM is built in a way that every call takes in to account
> > > > > the
> > > > > current state only. Debugging it with an engine flow mindset
> > > > > is
> > > > > just
> > > > > wrong and distracting. I see it doing more harm the good by
> > > > > reinforcing bad debugging practices.
> > > > 
> > > > Maybe you're right, though I can't see how from my experience
> > > > so
> > > > far,
> > > > but following the flows is the only thing that got cases
> > > > resolved.
> > > > Not event IDs making every possible error, and not task IDs
> > > > (though
> > > > these do have their uses) - slow and meticulous mapping of
> > > > flows
> > > > to
> > > > log records.
> > > > 
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "Keith Robertson" <krobe...@redhat.com>
> > > > > > To: "VDSM Project Development"
> > > > > > <vdsm-devel@lists.fedorahosted.org>
> > > > > > Sent: Thursday, February 9, 2012 1:34:43 PM
> > > > > > Subject: Re: [vdsm] flowID schema
> > > > > >
> > > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > >> From: "Ayal Baron"<aba...@redhat.com>
> > > > > > >> To: "Dan Kenigsberg"<dan...@redhat.com>
> > > > > > >> Cc: "VDSM Project
> > > > > > >> Development"<vdsm-devel@lists.fedorahosted.org>
> > > > > > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > > > > > >> Subject: Re: [vdsm] flowID schema
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> ----- Original Message -----
> > > > > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi
> > > > > > >>> wrote:
> > > > > > >>>> flowID makes no sense after the initial API call as
> > > > > > >>>> stuff
> > > > > > >>>> like
> > > > > > >>>> cacheing\threadpools\samplingtasks\resources\asyncTasks
> > > > > > >>>> so
> > > > > > >>>> flowing
> > > > > > >>>> a flow like that will not give you the entire picture
> > > > > > >>>> while
> > > > > > >>>> debugging.
> > > > > > >>>>
> > > > > > >>>> Also adding it now will make everything even more
> > > > > > >>>> ugly.
> > > > > > >>>> You know what, just imagine I wrote one of my long
> > > > > > >>>> rambles
> > > > > > >>>> about
> > > > > > >>>> why I don't agree with doing this.
> > > > > > >>> I cannot imagine you write anything like that. Really.
> > > > > > >>> I
> > > > > > >>> do
> > > > > > >>> not
> > > > > > >>> understand why you object logging flowID on API entry
> > > > > > >>> point.
> > > > > > >> The question is, what problem is this really trying to
> > > > > > >> solve
> > > > > > >> and
> > > > > > >> is
> > > > > > >> there a simpler and less obtrusive solution to that
> > > > > > >> problem?
> > > > > > > correlating logs between ovirt engine and potentially
> > > > > > > multiple
> > > > > > > vdsm
> > > > > > > nodes is a nightmare. It requires a lot skill to follow a
> > > > > > > transaction through from the front end all the way to the
> > > > > > > node,
> > > > > > > and even multiple nodes (eg actions on spm, then actions
> > > > > > > on
> > > > > > > other
> > > > > > > node to run a vm).
> > > > > > > Having a way to correlate the logs and follow a single
> > > > > > > event/flow
> > > > > > > is vital.
> > > > > > >
> > > > > > +1
> > > > > >
> > > > > > Knowing what command caused a sequence of events in VDSM
> > > > > > would
> > > > > > be
> > > > > > really
> > > > > > helpful particularly in a threaded environment.  Further,
> > > > > > wouldn't
> > > > > > such
> > > > > > an ID be helpful in an asynchronous request/response model?
> > > > > >  I'm
> > > > > > not
> > > > > > sure what the plans are for AMQP or even if there are
> > > > > > plans,
> > > > > > but
> > > > > > I'd
> > > > > > think that something like this would be crucial for an
> > > > > > async
> > > > > > response.
> > > > > > So, if you implemented it you might be killing 2 birds with
> > > > > > 1
> > > > > > stone.
> > > > > >
> > > > > > FYI: If you want to see examples of other systems that use
> > > > > > similar
> > > > > > concepts, take a look at the correlation ID in JMS.
> > > > > >
> > > > > > Cheers,
> > > > > > Keith
> > > > > >
> > > > > >
> > > > > > >>> _______________________________________________
> > > > > > >>> vdsm-devel mailing list
> > > > > > >>> vdsm-devel@lists.fedorahosted.org
> > > > > > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > >>>
> > > > > > >> _______________________________________________
> > > > > > >> vdsm-devel mailing list
> > > > > > >> vdsm-devel@lists.fedorahosted.org
> > > > > > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > > >>
> > > > > > > _______________________________________________
> > > > > > > vdsm-devel mailing list
> > > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > >
> > > > > > _______________________________________________
> > > > > > vdsm-devel mailing list
> > > > > > vdsm-devel@lists.fedorahosted.org
> > > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > > >
> > > > > _______________________________________________
> > > > > vdsm-devel mailing list
> > > > > vdsm-devel@lists.fedorahosted.org
> > > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > >
> > > > 
> > > > 
> > > > --
> > > > 
> > > > 
> > > > 
> > > > Regards,
> > > > 
> > > > Dan Yasny
> > > > Red Hat Israel
> > > > +972 9769 2280
> > > > _______________________________________________
> > > > vdsm-devel mailing list
> > > > vdsm-devel@lists.fedorahosted.org
> > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > > > 
> > > 
> > 
> > --
> > 
> > 
> > 
> > Regards,
> > 
> > Dan Yasny
> > Red Hat Israel
> > +972 9769 2280
> > 
> 

-- 



Regards, 

Dan Yasny 
Red Hat Israel 
+972 9769 2280
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to