After thinking about this a bit more, I was getting bogged down in the idea that is was a requirement to keep the interfaces the same. However this was causing us to work around things a way that limited how we would do things going forward. I think we should be doing something along the lines of using a descriptor to pass the error information.

While this requires a change to the interface it also gives us the ability to make this more extensible as well as opaque.

I'm updating the document now to address this issues and will send out an updated version when I have that complete.

Thanks!
-evan

Evan Layton wrote:
My apologies for not getting all of the responses in the first email, I hit send too soon...

[email protected] wrote:
On Thu, Aug 20, 2009 at 04:31:21PM -0600, Evan Layton wrote:
In my high-level comments, I observed that this structure doesn't allow
libbe to grow or add any other error information beyond what's provided
here. At a minimum, there should be a field, or linked list, that allow
callers in the error handling chain to add ancillary information as an
operation progresses.  The initial failure may be that a device was in
use, but as the error is propagated up the call chain, there may be
other failures that occur too.  How do you plan to handle this case?
Augmenting the error object with other information is one approach, but
there are certainly others.
Anything that happens after this failure it not something we necessarily want to pass back back through to a user. We don't want to give them more information than they really need and should stop at the first error that caused us to fail. This was discussed with Frank and it was his suggestion that we don't provide too much information and confuse the user.

I'm not sure this adresses my comment, really.  What if it would be
useful to add additional failure information in the call chain?  It
could help with debugging in some situations.  Your plan is not to
handle this at all?  What if you get two distinct errors performing an
operation.  Should you chain them together, raise them separately, or
just report the first?

I originally had the idea of what I termed "clean-up" errors and error strings for errors that may happen while doing any cleanup needed due to hitting the original error. I had removed them based on comments from Frank, however can see your point now that just because we keep track of this information does not mean that we have to display that to the user...


Given the response, I should point out that users may not be the primary
consumer of this information.  Yes, we need to pretty print an error
message at some point, but obtaining relevant and useful debugging /
problem solving information is critical.  Other frameworks may want to
use this to determine how to handle more complicated error conditions.

OK I think this is the point I was not getting a clue on the first time around. What you're saying is that the information we may want to collect may be much more extensive that just the information we would pass to a user.

I guess the kind of thing you're thinking of is, that with this added information it may be possible for consumers of say libbe to get back enough information that things like beadm or pkg may be able to take corrective action based on what we pass back, fix the problem and the user doesn't have to do anything to fix the issue.

I can agree with this and I'll look at this a bit more. However I'm not sure how much of this is really out of scope for this project.


There are other extensibility issues to consider.  In particular, this
format is fixed to contain 5 pointers to char *.  I think it would make
more sense to define a generic err_info structure that contains a code
that describes the type of error that follows.
Error code are separate from this error structure and are returned by the libbe functions themselves. Changing this would cause a major interface change and require all the consumers of the current library and python bridge to change.

I'm not trying to suggest that you move your return codes into this
structure, rather I'm suggesting that one possibility for making the
format extensible is to use a value to determine what type of structure
you're looking at.  That way it's possible to have multiple types of
err_info_t's that contain different information about different errors,
if it ever becomes necessary to build different classes of error
information.

So what you're really referring to here is more along the idea of having a different structure for each type of error would could possibly hit? Or is it more that we would, for example, have different structures for things like errors internal to the library (libbe), a structure for zfs errors and one for other things outside the library? At this point I don't see the need for that as far a this project since these are all fairly similar. Also the way libbe handle errors is when we hit a failure we stop, clean up and return so the only errors that may also need to captured are any errors while cleaning up. One example of this would be something like a BE creation that fails doing the dataset cloning and then destroying all the snapshots for it also fails. We would want to gather that information as well.


The sockaddr structure is a good example of something similar.
Depending upon what socket method is being used, the pointer to
sockaddr is copied in and then read as a particular structure.  I can
imagine that we might want to have multiple different types of
err_info structures, but the current approach precludes that option.
I still don't see why you would need more than one type since you will be puuting the error information you want into the available strings and passing backe the error code separately from the structure. You can put what ever you want into the strings within the structure.

I think you've misunderstood me.  We may both think that five strings
are sufficient to solve the problem today.  The point is that in the
future we may want six; or three strings, two integers, a double, and a
pointer to another struct.  The idea is that you want a structure that
is flexible so that if the error reporting needs change, you can deliver
the proper information to libbe's callers without requiring them to
change the way they use the interface.  What I'm trying to ask is,
"What's your approach for coping with both planned and unexpected additions
to this structure?"

This in relationship to you other comments now make more sense to me. We may want to return as much information as possible to the consumer of the library. It appears that what you are asking for is that we return information from the library to the consumer that includes not just information about what failed and what the error string may have been but also information that is really internal to the library like maybe what the contents of a sockaddr was or the zfs handle of the dataset that is failing. I think for the orposes of this project that is a bit out of scope but is definitely something we should be thinking about for the Caiman Unified Design (CUD) and the error handling it is intended to do.

Please keep in mind that the error handling we are talking about here is for libbe only and does not address the full error handling for CUD. While the ideas you've expressed are definitely things to think about for CUD they are not all things we will be able to address here.

The idea of adding several types of error structures is a good idea but I'd suggest that these be based on the type of consumer of the error handler. In other word there would be libbe, AI, DC, etc. structures that would be able to handle the different type of errors for these components. We would want to be able to make it fairly strait forward to add need types. However this is also out of scope for this project.

The one thing I do agree with you here is the need to have the structure that we use be something that can be changed without too much pain as we move from what we are doing here to what will be done for CUD.

To answer your question on "coping with both planned and unexpected additions" I don't think I have a complete answer yet. Any suggestions in addition to what you've already mentioned are definitely more than welcome!!!



In addition to refining the interface, I would make sure that you have a
good way of constructing and destroying these objects.  If they're tied
to a boot-environment specifc handle that's great.  Otherwise, you'll
have to make sure that the caller calls some destroy_my_error_object()
routine to get these cleaned up.  It's not optimal, especially since
some applications are haphazardly constructed.
That's one of the ways of doing this that Ethan and I had been talking
about yesterday. The idea of creating a handle to contain the structure
and we'd be able to pass the information back inside the handle and free
up the memory when the handle is destroyed. The only problem we have
with this right now is that all of the interfaces use nvlists so passing
a handle from the library and back into it would also require an
interface change that would brake anything already using the library.

I guess I was thinking a bit more generically, but what you describe
makes sense.  Typically, the handle is an opaque struct that contains
object specific information.  If you had a handle per-be and did your
operations in terms of a be, then I suppose it might be easier to
implement this.

On the other hand, if the majority of the libbe consumers are going to
want to change to take advantage of the additonal error reporting,
making the change to use handles at the same time might not be so
disruptive.

One of the requirements we have at the moment is that adding this should not break existing callers of the library. The use of something like a global structure or something those lines may be one way to allow us to do that.


Thanks,

-j

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to