Re: [Xen-devel] RFC: Proposal to improve error reporting in libxl

2015-05-21 Thread Ian Campbell
On Wed, 2015-05-20 at 10:01 +0100, Euan Harris wrote:
 We would like to make libxl's error reporting more regular and informative
 for callers.

Yes, I agree this is a good idea.

 We think we need to:
 
   * Make a list of the error conditions which can be encountered by
 all the current API calls and define a set of error codes to
 cover them.   Some codes will be generally applicable, but give
 more information than a bare 'fail'; others may be specific to a
 particular API call.   We may keep the existing FAIL, INVAL and 
 NOMEM codes for times when they are appropriate.
 
 The error messages logged by each API call are a good starting point
 for this list.   I have included a partial list below.
 
   * Change the API calls in point 1 above to use the new, more detailed
 error codes.   These changes will break client code which checks for
 particular error codes, rather than just checking whether the return
 code is non-zero.

I think that at least for the case of turning a uselessly generic
ERROR_FAIL into something specific this is tolerable.

I'm not sure if we will want a raft of LIBXL_HAVE_ERROR_* defines. It
seems not all that useful.

(Perhaps every enum value FOO in the IDL should get a #define FOO FOO to
make this automatic?)

   * For the API calls in points 2 and 3 which can encounter errors
 which callers might care about, change their interfaces to return
 error codes.   For functions which previously returned pointers,
 add pointer-to-pointer parameters.

I think this is doable using the LIBXL_API_VERSION arrangements
(described in libxl.h)

Essential for an existing
THING *get_a_thing_list(ctx, int *nr_r)
which you are converting to
int get_a_thing_list(ctx, THING **list_r, int *nr_r)

you would do something like:

#if LIBXL_API_VERSION  0x406000 /* or whatever version */
static inline THING *get_a_thing_list(ctx, int *nr_r)
{
THING *list = NULL
int rc = get_a_thing_list(ctx, list, nr_r);
if ( rc )
return NULL
else
return list;
}
#endif

Or something along those lines.

   * For error codes returned by utility functions, described in point 4, 
 decide whether the code can be reported directly or an API-level
 error should be reported instead.

I think this fits in with the cleanups made in the second bullet and is
likewise fine. (But worth considering separately I agree)

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] RFC: Proposal to improve error reporting in libxl

2015-05-20 Thread Euan Harris
Hi,

Several of us at Citrix are currently working on rewriting xenopsd -
the component of the Xapi toolstack which handles domain creation,
destruction, device attachment and so on - to use libxl instead of libxc.
One of the problems we have run into is that error reporting in libxl
is not detailed and expressive enough to let us take corrective action
or present meaningful high level error messages to users.   In particular:

  1 Many API calls return error codes, but these codes don't tell us
very much about what went wrong.   In most cases the error code is
ERROR_FAIL, ERROR_INVAL or ERROR_NOMEM.   The code may also have
been returned by a utility function and passed back unaltered by
the API call.

Examples: libxl_domain_resume, libxl_domain_remus_start (mostly
  ERROR_FAIL).   libxl_domain_rename uses ERROR_NOMEM, ERROR_INVAL
  and ERROR_FAIL.

  2 A few API calls return pointers.   In some cases a null pointer
indicates that an error occured, but in others null could possibly be
a normal return value.   If null does indicate an error, it still does
not give us any information about what happened.

Examples: libxl_list_domain, libxl_list_cpupool, libxl_list_vm

  3 A few API calls have no return value but may still log errors. 

Examples: libxl_disable_domain_death,

  4 Most API calls log detailed error messages through XTL, but there
does not seem to be a way for the caller to make use of them.  Some
error messages may also have been logged by utility functions, not
by the API call itself.

We would like to make libxl's error reporting more regular and informative
for callers.We think we need to:

  * Make a list of the error conditions which can be encountered by
all the current API calls and define a set of error codes to
cover them.   Some codes will be generally applicable, but give
more information than a bare 'fail'; others may be specific to a
particular API call.   We may keep the existing FAIL, INVAL and 
NOMEM codes for times when they are appropriate.

The error messages logged by each API call are a good starting point
for this list.   I have included a partial list below.

  * Change the API calls in point 1 above to use the new, more detailed
error codes.   These changes will break client code which checks for
particular error codes, rather than just checking whether the return
code is non-zero.

  * For the API calls in points 2 and 3 which can encounter errors
which callers might care about, change their interfaces to return
error codes.   For functions which previously returned pointers,
add pointer-to-pointer parameters.

  * For error codes returned by utility functions, described in point 4, 
decide whether the code can be reported directly or an API-level
error should be reported instead.

These changes will certainly require a new version of the libxl API.   We
have considered other options which would be less likely to break existing
code - in particular the option of keeping the existing return codes but
adding an errno-like field which could contain a more specific error number.
This approach would not break existing client code and would allow API calls
which return void or pointers to report errors.   However there is no obvious
place to put the errno field - the ctx struct is not suitable because it might
be used by several callers simultaneously.The API-breaking change outlined
above seems to be the cleanest option, although it requires work on both libxl
and its clients.

All comments on this proposal are very welcome.   Examples of API calls
which would not fit well into the new design would be particularly useful.
We are also keen to hear whether or not these changes would be useful
for libvirt and other toolstacks using libxl, and whether those toolstacks
have encountered any other error reporting problems which we have missed.

Thanks,
Euan



Notes on some API functions defined in libxl.c.

'ERROR_FOO, string' means that when ERROR_FOO is returned, string
is also logged.   In most non-trivial functions, error reporting is
handled as follows:
   * store the error code in 'rc' or a similar local variable 
   * log the error string
   * goto the 'out' or 'out_err' label at the end of the function
   

libxl_ctx_alloc:
  success: 0
  failure:
ERROR_VERSION
ERROR_FAIL, Failed to initialize mutex
ERROR_FAIL, cannot open libxc handle
ERROR_FAIL, cannot connect to xenstore
ERROR_FAIL, cannot open libxc evtchn handle (from libxl__ctx_evtchn_init) 

libxl_ctx_free:
  success: 0
  no errors
  
libxl_string_list_dispose:
  no return value
  no interesting failure modes apart from segfault

libxl_string_list_copy:
libxl_key_value_list_copy
  no return value
  if calloc fails, logs libxl: FATAL ERROR: memory allocation failure and 
exits
  if strdup fails, logs libxl: FATAL ERROR: memory allocation failure and 
exits