Hi,
Several of us at Citrix are currently working on rewriting xenopsd -
the component of the Xapi toolstack which handles domain creation,
destruction, device attachment and so on - to use libxl instead of libxc.
One of the problems we have run into is that error reporting in libxl
is not detailed and expressive enough to let us take corrective action
or present meaningful high level error messages to users. In particular:
1 Many API calls return error codes, but these codes don't tell us
very much about what went wrong. In most cases the error code is
ERROR_FAIL, ERROR_INVAL or ERROR_NOMEM. The code may also have
been returned by a utility function and passed back unaltered by
the API call.
Examples: libxl_domain_resume, libxl_domain_remus_start (mostly
ERROR_FAIL). libxl_domain_rename uses ERROR_NOMEM, ERROR_INVAL
and ERROR_FAIL.
2 A few API calls return pointers. In some cases a null pointer
indicates that an error occured, but in others null could possibly be
a normal return value. If null does indicate an error, it still does
not give us any information about what happened.
Examples: libxl_list_domain, libxl_list_cpupool, libxl_list_vm
3 A few API calls have no return value but may still log errors.
Examples: libxl_disable_domain_death,
4 Most API calls log detailed error messages through XTL, but there
does not seem to be a way for the caller to make use of them. Some
error messages may also have been logged by utility functions, not
by the API call itself.
We would like to make libxl's error reporting more regular and informative
for callers.We think we need to:
* Make a list of the error conditions which can be encountered by
all the current API calls and define a set of error codes to
cover them. Some codes will be generally applicable, but give
more information than a bare 'fail'; others may be specific to a
particular API call. We may keep the existing FAIL, INVAL and
NOMEM codes for times when they are appropriate.
The error messages logged by each API call are a good starting point
for this list. I have included a partial list below.
* Change the API calls in point 1 above to use the new, more detailed
error codes. These changes will break client code which checks for
particular error codes, rather than just checking whether the return
code is non-zero.
* For the API calls in points 2 and 3 which can encounter errors
which callers might care about, change their interfaces to return
error codes. For functions which previously returned pointers,
add pointer-to-pointer parameters.
* For error codes returned by utility functions, described in point 4,
decide whether the code can be reported directly or an API-level
error should be reported instead.
These changes will certainly require a new version of the libxl API. We
have considered other options which would be less likely to break existing
code - in particular the option of keeping the existing return codes but
adding an errno-like field which could contain a more specific error number.
This approach would not break existing client code and would allow API calls
which return void or pointers to report errors. However there is no obvious
place to put the errno field - the ctx struct is not suitable because it might
be used by several callers simultaneously.The API-breaking change outlined
above seems to be the cleanest option, although it requires work on both libxl
and its clients.
All comments on this proposal are very welcome. Examples of API calls
which would not fit well into the new design would be particularly useful.
We are also keen to hear whether or not these changes would be useful
for libvirt and other toolstacks using libxl, and whether those toolstacks
have encountered any other error reporting problems which we have missed.
Thanks,
Euan
Notes on some API functions defined in libxl.c.
'ERROR_FOO, string' means that when ERROR_FOO is returned, string
is also logged. In most non-trivial functions, error reporting is
handled as follows:
* store the error code in 'rc' or a similar local variable
* log the error string
* goto the 'out' or 'out_err' label at the end of the function
libxl_ctx_alloc:
success: 0
failure:
ERROR_VERSION
ERROR_FAIL, Failed to initialize mutex
ERROR_FAIL, cannot open libxc handle
ERROR_FAIL, cannot connect to xenstore
ERROR_FAIL, cannot open libxc evtchn handle (from libxl__ctx_evtchn_init)
libxl_ctx_free:
success: 0
no errors
libxl_string_list_dispose:
no return value
no interesting failure modes apart from segfault
libxl_string_list_copy:
libxl_key_value_list_copy
no return value
if calloc fails, logs libxl: FATAL ERROR: memory allocation failure and
exits
if strdup fails, logs libxl: FATAL ERROR: memory allocation failure and
exits