Re: [pkg-discuss] [caiman-discuss] cross-project review: Draft BE Error observability design

Evan Layton Fri, 21 Aug 2009 09:20:10 -0700

After thinking about this a bit more, I was getting bogged down in the idea thatis was a requirement to keep the interfaces the same. However this was causingus to work around things a way that limited how we would do things goingforward. I think we should be doing something along the lines of using adescriptor to pass the error information.

While this requires a change to the interface it also gives us the ability tomake this more extensible as well as opaque.

I'm updating the document now to address this issues and will send out anupdated version when I have that complete.


Thanks!
-evan

Evan Layton wrote:

My apologies for not getting all of the responses in the first email, Ihit send too soon...
[email protected] wrote:
On Thu, Aug 20, 2009 at 04:31:21PM -0600, Evan Layton wrote:
In my high-level comments, I observed that this structure doesn't allow
libbe to grow or add any other error information beyond what's provided
here. At a minimum, there should be a field, or linked list, thatallow
callers in the error handling chain to add ancillary information as an
operation progresses.  The initial failure may be that a device was in
use, but as the error is propagated up the call chain, there may be
other failures that occur too.  How do you plan to handle this case?
Augmenting the error object with other information is one approach, but
there are certainly others.
Anything that happens after this failure it not something wenecessarily want to pass back back through to a user. We don't wantto give them more information than they really need and should stopat the first error that caused us to fail. This was discussed withFrank and it was his suggestion that we don't provide too muchinformation and confuse the user.
I'm not sure this adresses my comment, really.  What if it would be
useful to add additional failure information in the call chain?  It
could help with debugging in some situations.  Your plan is not to
handle this at all?  What if you get two distinct errors performing an
operation.  Should you chain them together, raise them separately, or
just report the first?
I originally had the idea of what I termed "clean-up" errors and errorstrings for errors that may happen while doing any cleanup needed due tohitting the original error. I had removed them based on comments fromFrank, however can see your point now that just because we keep track ofthis information does not mean that we have to display that to the user...
Given the response, I should point out that users may not be the primary
consumer of this information.  Yes, we need to pretty print an error
message at some point, but obtaining relevant and useful debugging /
problem solving information is critical.  Other frameworks may want to
use this to determine how to handle more complicated error conditions.
OK I think this is the point I was not getting a clue on the first timearound. What you're saying is that the information we may want tocollect may be much more extensive that just the information we wouldpass to a user.
I guess the kind of thing you're thinking of is, that with this addedinformation it may be possible for consumers of say libbe to get backenough information that things like beadm or pkg may be able to takecorrective action based on what we pass back, fix the problem and theuser doesn't have to do anything to fix the issue.
I can agree with this and I'll look at this a bit more. However I'm notsure how much of this is really out of scope for this project.
There are other extensibility issues to consider.  In particular, this
format is fixed to contain 5 pointers to char *.  I think it would make
more sense to define a generic err_info structure that contains a code
that describes the type of error that follows.
Error code are separate from this error structure and are returned bythe libbe functions themselves. Changing this would cause a majorinterface change and require all the consumers of the current libraryand python bridge to change.
I'm not trying to suggest that you move your return codes into this
structure, rather I'm suggesting that one possibility for making the
format extensible is to use a value to determine what type of structure
you're looking at.  That way it's possible to have multiple types of
err_info_t's that contain different information about different errors,
if it ever becomes necessary to build different classes of error
information.
So what you're really referring to here is more along the idea of havinga different structure for each type of error would could possibly hit?Or is it more that we would, for example, have different structures forthings like errors internal to the library (libbe), a structure for zfserrors and one for other things outside the library? At this point Idon't see the need for that as far a this project since these are allfairly similar. Also the way libbe handle errors is when we hit afailure we stop, clean up and return so the only errors that may alsoneed to captured are any errors while cleaning up. One example of thiswould be something like a BE creation that fails doing the datasetcloning and then destroying all the snapshots for it also fails. Wewould want to gather that information as well.
The sockaddr structure is a good example of something similar.
Depending upon what socket method is being used, the pointer to
sockaddr is copied in and then read as a particular structure.  I can
imagine that we might want to have multiple different types of
err_info structures, but the current approach precludes that option.
I still don't see why you would need more than one type since youwill be puuting the error information you want into the availablestrings and passing backe the error code separately from thestructure. You can put what ever you want into the strings within thestructure.
I think you've misunderstood me.  We may both think that five strings
are sufficient to solve the problem today.  The point is that in the
future we may want six; or three strings, two integers, a double, and a
pointer to another struct.  The idea is that you want a structure that
is flexible so that if the error reporting needs change, you can deliver
the proper information to libbe's callers without requiring them to
change the way they use the interface.  What I'm trying to ask is,
"What's your approach for coping with both planned and unexpectedadditions
to this structure?"
This in relationship to you other comments now make more sense to me. Wemay want to return as much information as possible to the consumer ofthe library. It appears that what you are asking for is that we returninformation from the library to the consumer that includes not justinformation about what failed and what the error string may have beenbut also information that is really internal to the library like maybewhat the contents of a sockaddr was or the zfs handle of the datasetthat is failing. I think for the orposes of this project that is a bitout of scope but is definitely something we should be thinking about forthe Caiman Unified Design (CUD) and the error handling it is intended todo.
Please keep in mind that the error handling we are talking about here isfor libbe only and does not address the full error handling for CUD.While the ideas you've expressed are definitely things to think aboutfor CUD they are not all things we will be able to address here.
The idea of adding several types of error structures is a good idea butI'd suggest that these be based on the type of consumer of the errorhandler. In other word there would be libbe, AI, DC, etc. structuresthat would be able to handle the different type of errors for thesecomponents. We would want to be able to make it fairly strait forward toadd need types. However this is also out of scope for this project.
The one thing I do agree with you here is the need to have the structurethat we use be something that can be changed without too much pain as wemove from what we are doing here to what will be done for CUD.
To answer your question on "coping with both planned and unexpectedadditions" I don't think I have a complete answer yet. Any suggestionsin addition to what you've already mentioned are definitely more thanwelcome!!!
In addition to refining the interface, I would make sure that youhave a
good way of constructing and destroying these objects.  If they're tied
to a boot-environment specifc handle that's great.  Otherwise, you'll
have to make sure that the caller calls some destroy_my_error_object()
routine to get these cleaned up.  It's not optimal, especially since
some applications are haphazardly constructed.
That's one of the ways of doing this that Ethan and I had been talking
about yesterday. The idea of creating a handle to contain the structure
and we'd be able to pass the information back inside the handle and free
up the memory when the handle is destroyed. The only problem we have
with this right now is that all of the interfaces use nvlists so passing
a handle from the library and back into it would also require an
interface change that would brake anything already using the library.
I guess I was thinking a bit more generically, but what you describe
makes sense.  Typically, the handle is an opaque struct that contains
object specific information.  If you had a handle per-be and did your
operations in terms of a be, then I suppose it might be easier to
implement this.

On the other hand, if the majority of the libbe consumers are going to
want to change to take advantage of the additonal error reporting,
making the change to use handles at the same time might not be so
disruptive.
One of the requirements we have at the moment is that adding this shouldnot break existing callers of the library. The use of something like aglobal structure or something those lines may be one way to allow us todo that.
Thanks,

-j
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss


_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [caiman-discuss] cross-project review: Draft BE Error observability design

Reply via email to