Steven Dake a écrit :
On Tue, 2010-03-30 at 09:57 +0200, Jerome Flesch wrote:
Patch looks good for commit
Committed.
responses to questions inline
On Mon, 2010-03-29 at 11:10 +0200, Jerome Flesch wrote:
Hello,
I just had a weird case where SaClmInitialize() returned the value 0 to
me. Since this value isn't valid for a enum SaAisErrorT, I investigated
and I found out that corosync/coroipcc.c:coroipcc_service_connect() can
return 0 in case Corosync was compiled with _POSIX_THREAD_PROCESS_SHARED
<= 0 (ie on a FreeBSD system) when there is a problem with the IPC
semaphores. This case is actually likely to happen on a FreeBSD system
due to IPC default limits.
In the process, I also changed the way the variable 'res' was used:
Since it's an enum, I've assumed it doesn't really make sense to use it
to store return values of system calls, so I replaced it by sys_res.
Also, I was wondering if there could be a way to get more details when
coroipcc_service_connect() fails. CS_ERR_LIBRARY and CS_ERR_TRY_AGAIN
are not exactly what I would call meaningful error codes ... It's not
very handy to have to try to guess if it failed because permissions on
the Unix socket are wrong or because IPC limits have been reached or
because an upper layer failed for some reason. I assume adding error
codes like CS_ERR_SEM, CS_ERR_SHM, could be an option but letting such
level of implementation details be visible in the error codes is not a
valid option, right ?
yes i prefer to have less error codes with no implementation details.
Right now, when I have to diagnose this kind of issue, I'm running a
patched version of Flatiron including the following macro (in lib/util.h):
+#ifdef DEBUG
+#define DPRINTX(...) do { \
+ fprintf(stderr, "Corosync lib: %s:L%d: ", __FILE__, __LINE__); \
+ fprintf(stderr, __VA_ARGS__); \
+ fprintf(stderr, "\n"); \
+ } while (0)
+#else
+#define DPRINTX(...)
+#endif
Is there a more adequate way of getting debug information from the
libraries themselves ?
I was thinking of a slightly more elegant way of doing it. I could add
functions like for example: 'void coroipcc_set_lasterror(const char
*str)', 'const char *coroipcc_get_lasterror(void)' ? Using
pthread_setspecific(), it would be possible to make it thread-safe. Do
you think it would be worth I spend some time on making a patch for that ?
sounds interesting. I wouldn't mind some diagnostic capability coming
out of the apis, but the thread safety was a problem I didn't know how
to solve. After reading the man page for pthread_setspecific, it looks
like it might solve the problem.
I would prefer apis something like this in coroipcc.c
static void coroipcc_lasterror_set (const char *fmt, ...);
It makes sense. It would allow to get more details like for instance
strerror(errno).
and in coroipcc.h
extern const char *coroipcc_lasterror_get (void).
Curious how you plan to create the key and garbage collect the
pthread_key values...
It's pretty simple actually: pthread_key_create() allows to attach a
destructor to a pthread key. This destructor is called each time a
thread stops, with as argument the pointer stored with
pthread_setspecific for this thread. So if we store a simple buffer with
pthread_setspecific, we can simply attach 'free()' to the key :)
How do you pthread_key_create? Use pthread_once in each api call? I
want to avoid a global init function for apis. pthread_once may have
performance implications if it takes a mutex each time it runs. Another
option is to use a constructor section in the library.
Like to see a patch..
And here it is. Of course, it's more a draft than a definitive patch.
Also, it would be nice to know if we go for this kind of implementation
or if we go for Angus' idea using a callback.
FYI, with this patch, when I run test/cpgverify without having a
Corosync running, I get:
% ./cpgverify
Couldn't initialize CPG service 6 : coroipcc.c, L750: connect() failed
(61: Connection refused)
Regards
-steve
This could have good implications for the other apis beyond coroipcc...
Regards
-steve
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais
diff --git a/include/corosync/coroipcc.h b/include/corosync/coroipcc.h
index bc912ca..fb7c940 100644
--- a/include/corosync/coroipcc.h
+++ b/include/corosync/coroipcc.h
@@ -117,6 +117,8 @@ coroipcc_zcb_msg_send_reply_receive (
void *res_msg,
size_t res_len);
+extern const char *coroipcc_lasterror_get(void);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/coroipcc.c b/lib/coroipcc.c
index 06e4d75..825bf60 100644
--- a/lib/coroipcc.c
+++ b/lib/coroipcc.c
@@ -37,6 +37,7 @@
#include <config.h>
+#include <stdarg.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
@@ -116,6 +117,67 @@ static void socket_nosigpipe(int s)
#define MSG_NOSIGNAL 0
#endif
+#define LASTERROR_BUFFER_SIZE 512
+
+static pthread_once_t once_control = PTHREAD_ONCE_INIT;
+static pthread_key_t lasterror_key;
+
+static void lasterror_init (void)
+{
+ pthread_key_create (&lasterror_key, free);
+}
+
+static inline char *coroipcc_lasterror_get_buffer (void)
+{
+ if (pthread_once (&once_control, lasterror_init) < 0)
+ return NULL;
+ return pthread_getspecific (lasterror_key);
+}
+
+static void _coroipcc_lasterror_set (const char *file, int line, int err, const char *fmt, ...) \
+ __attribute__ ((format (printf, 4, 5)));
+#define coroipcc_lasterror_set(...) _coroipcc_lasterror_set (__FILE__, __LINE__, 0, __VA_ARGS__)
+#define coroipcc_lasterror_perror(...) _coroipcc_lasterror_set (__FILE__, __LINE__, errno, __VA_ARGS__)
+
+static void _coroipcc_lasterror_set (const char *file, int line, int err, const char *fmt, ...)
+{
+ va_list ap;
+ char *buffer;
+ int len;
+
+ buffer = coroipcc_lasterror_get_buffer ();
+ if (buffer == NULL)
+ {
+ buffer = malloc (LASTERROR_BUFFER_SIZE);
+ if (buffer == NULL || pthread_setspecific (lasterror_key, buffer) < 0)
+ {
+ free (buffer);
+ return;
+ }
+ buffer[0] = '\0';
+ }
+
+ snprintf (buffer, LASTERROR_BUFFER_SIZE, "%s, L%u: ", file, line);
+ len = strlen (buffer);
+
+ va_start (ap, fmt);
+ vsnprintf (buffer + len, LASTERROR_BUFFER_SIZE - len, fmt, ap);
+ va_end (ap);
+
+ if (err > 0)
+ {
+ char strerr[128];
+ strerror_r (err, strerr, sizeof(strerr));
+ len = strlen (buffer);
+ snprintf (buffer + len, LASTERROR_BUFFER_SIZE - len, " (%u: %s)", errno, strerr);
+ }
+}
+
+const char *coroipcc_lasterror_get (void)
+{
+ return coroipcc_lasterror_get_buffer ();
+}
+
static cs_error_t
socket_send (
int s,
@@ -157,6 +219,7 @@ retry_send:
goto retry_send;
break;
default:
+ coroipcc_lasterror_perror ("sendmsg() failed");
res = CS_ERR_LIBRARY;
goto res_exit;
}
@@ -214,6 +277,7 @@ retry_recv:
goto retry_recv;
break;
default:
+ coroipcc_lasterror_perror ("recvmsg() failed");
res = CS_ERR_LIBRARY;
goto res_exit;
}
@@ -223,6 +287,7 @@ retry_recv:
* EOF is detected when recvmsg return 0.
*/
if (result == 0) {
+ coroipcc_lasterror_set ("recvmsg(): BSD EOF");
res = CS_ERR_LIBRARY;
goto res_exit;
}
@@ -295,6 +360,7 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
sprintf (path, LOCALSTATEDIR "/run/%s", file);
fd = mkstemp (path);
if (fd == -1) {
+ coroipcc_lasterror_perror ("mkstemp(%s) failed", path);
return (-1);
}
}
@@ -305,6 +371,7 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (addr_orig == MAP_FAILED) {
+ coroipcc_lasterror_perror ("mmap() failed");
return (-1);
}
@@ -312,6 +379,11 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
MAP_FIXED | MAP_SHARED, fd, 0);
if (addr != addr_orig) {
+ if (addr == MAP_FAILED)
+ coroipcc_lasterror_perror ("mmap() failed");
+ else
+ coroipcc_lasterror_set ("mmap() didn't map the memory at the expected address (%p instead of %p)",
+ addr, addr_orig);
return (-1);
}
#ifdef COROSYNC_BSD
@@ -327,6 +399,7 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
res = close (fd);
if (res) {
+ coroipcc_lasterror_perror("close() failed");
return (-1);
}
*buf = addr_orig;
@@ -367,6 +440,7 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
sprintf (path, LOCALSTATEDIR "/run/%s", file);
fd = mkstemp (path);
if (fd == -1) {
+ coroipcc_lasterror_perror ("mkstemp(%s) failed", path);
return (-1);
}
}
@@ -377,6 +451,7 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (addr_orig == MAP_FAILED) {
+ coroipcc_lasterror_perror ("mmap() failed");
return (-1);
}
@@ -384,6 +459,11 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
MAP_FIXED | MAP_SHARED, fd, 0);
if (addr != addr_orig) {
+ if (addr == MAP_FAILED)
+ coroipcc_lasterror_perror ("mmap() failed");
+ else
+ coroipcc_lasterror_set ("mmap() didn't map the memory at the expected address (%p instead of %p)",
+ addr, addr_orig);
return (-1);
}
#ifdef COROSYNC_BSD
@@ -392,6 +472,7 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
res = close (fd);
if (res) {
+ coroipcc_lasterror_perror ("close() failed");
return (-1);
}
*buf = addr_orig;
@@ -415,6 +496,9 @@ msg_send (
for (i = 0; i < iov_len; i++) {
if ((req_buffer_idx + iov[i].iov_len) >
ipc_instance->request_size) {
+ coroipcc_lasterror_set ("req_buffer_idx (%u) + iov[i].iov_len (%u)"
+ " > ipc_instance->request_size (%u)",
+ req_buffer_idx, iov[i].iov_len, ipc_instance->request_size);
return (CS_ERR_INVALID_PARAM);
}
memcpy (&ipc_instance->request_buffer[req_buffer_idx],
@@ -426,6 +510,7 @@ msg_send (
#if _POSIX_THREAD_PROCESS_SHARED > 0
res = sem_post (&ipc_instance->control_buffer->sem0);
if (res == -1) {
+ coroipcc_lasterror_perror ("sem_post() failed");
return (CS_ERR_LIBRARY);
}
#else
@@ -447,6 +532,7 @@ retry_semop:
goto retry_semop;
} else
if (res == -1) {
+ coroipcc_lasterror_perror ("semop() failed");
return (CS_ERR_LIBRARY);
}
#endif
@@ -480,6 +566,8 @@ retry_semwait:
poll (&pfd, 1, 0);
if (pfd.revents == POLLERR || pfd.revents == POLLHUP) {
+ coroipcc_lasterror_set ("EOF: poll() returned %s",
+ (pfd.revents == POLLERR ? "POLLERR" : "POLLHUP"));
return (CS_ERR_LIBRARY);
}
@@ -508,6 +596,7 @@ retry_semop:
goto retry_semop;
} else
if (res == -1) {
+ coroipcc_lasterror_perror ("semop() failed");
return (CS_ERR_LIBRARY);
}
#endif
@@ -546,6 +635,8 @@ retry_semwait:
poll (&pfd, 1, 0);
if (pfd.revents == POLLERR || pfd.revents == POLLHUP) {
+ coroipcc_lasterror_set ("poll() returned %s",
+ (pfd.revents == POLLERR ? "POLLERR" : "POLLHUP"));
return (CS_ERR_LIBRARY);
}
@@ -574,6 +665,7 @@ retry_semop:
goto retry_semop;
} else
if (res == -1) {
+ coroipcc_lasterror_perror ("semop() failed");
return (CS_ERR_LIBRARY);
}
#endif
@@ -630,6 +722,7 @@ coroipcc_service_connect (
request_fd = socket (PF_LOCAL, SOCK_STREAM, 0);
#endif
if (request_fd == -1) {
+ coroipcc_lasterror_perror ("socket() failed");
return (CS_ERR_LIBRARY);
}
#ifdef SO_NOSIGPIPE
@@ -717,6 +810,7 @@ coroipcc_service_connect (
* an existing shared memory segment for which we have access
*/
if (errno != EEXIST && errno != EACCES) {
+ coroipcc_lasterror_perror ("semget() failed");
res = CS_ERR_LIBRARY;
goto error_exit;
}
@@ -725,12 +819,14 @@ coroipcc_service_connect (
semun.val = 0;
sys_res = semctl (ipc_instance->semid, 0, SETVAL, semun);
if (sys_res != 0) {
+ coroipcc_lasterror_perror ("semctl() failed");
res = CS_ERR_LIBRARY;
goto error_exit;
}
sys_res = semctl (ipc_instance->semid, 1, SETVAL, semun);
if (sys_res != 0) {
+ coroipcc_lasterror_perror ("semctl() failed");
res = CS_ERR_LIBRARY;
goto error_exit;
}
@@ -811,6 +907,7 @@ coroipcc_service_disconnect (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
@@ -831,6 +928,7 @@ coroipcc_dispatch_flow_control_get (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
@@ -850,6 +948,7 @@ coroipcc_fd_get (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
@@ -876,6 +975,7 @@ coroipcc_dispatch_get (
error = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (error != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (error);
}
@@ -891,6 +991,7 @@ coroipcc_dispatch_get (
goto error_put;
} else
if (poll_events == -1) {
+ coroipcc_lasterror_perror ("poll() failed");
error = CS_ERR_LIBRARY;
goto error_put;
} else
@@ -899,6 +1000,8 @@ coroipcc_dispatch_get (
goto error_put;
}
if (poll_events == 1 && (ufds.revents & (POLLERR|POLLHUP))) {
+ coroipcc_lasterror_set ("poll() returned %s",
+ (ufds.revents == POLLHUP ? "POLLHUP" : "POLLERR"));
error = CS_ERR_LIBRARY;
goto error_put;
}
@@ -909,6 +1012,7 @@ coroipcc_dispatch_get (
goto error_put;
} else
if (res == -1) {
+ coroipcc_lasterror_perror ("recv() failed");
error = CS_ERR_LIBRARY;
goto error_put;
} else
@@ -917,6 +1021,7 @@ coroipcc_dispatch_get (
* happen only on BSD and Darwing systems since poll() returns a
* POLLHUP event on other systems.
*/
+ coroipcc_lasterror_perror ("recv(): BSD EOF");
error = CS_ERR_LIBRARY;
goto error_put;
}
@@ -971,6 +1076,7 @@ coroipcc_dispatch_put (hdb_handle_t handle)
res = hdb_error_to_cs (hdb_handle_get_always (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handlet_get_always() failed");
return (res);
}
#if _POSIX_THREAD_PROCESS_SHARED > 0
@@ -994,6 +1100,7 @@ retry_semop:
goto retry_semop;
} else
if (res == -1) {
+ coroipcc_lasterror_perror ("semop() failed");
res = CS_ERR_LIBRARY;
goto error_exit;
}
@@ -1032,6 +1139,7 @@ coroipcc_msg_send_reply_receive (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
@@ -1063,6 +1171,7 @@ coroipcc_msg_send_reply_receive_in_buf_get (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
@@ -1090,6 +1199,7 @@ coroipcc_msg_send_reply_receive_in_buf_put (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
hdb_handle_put (&ipc_hdb, handle);
@@ -1117,6 +1227,7 @@ coroipcc_zcb_alloc (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
map_size = size + header_size + sizeof (struct coroipcs_zc_header);
@@ -1161,6 +1272,7 @@ coroipcc_zcb_free (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
@@ -1201,6 +1313,7 @@ coroipcc_zcb_msg_send_reply_receive (
res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
if (res != CS_OK) {
+ coroipcc_lasterror_set ("hdb_handle_get() failed");
return (res);
}
hdr = (struct coroipcs_zc_header *)(((char *)msg) - sizeof (struct coroipcs_zc_header));
diff --git a/test/cpgverify.c b/test/cpgverify.c
index c7b400a..b0fef5a 100644
--- a/test/cpgverify.c
+++ b/test/cpgverify.c
@@ -41,6 +41,7 @@
#include <arpa/inet.h>
#include <errno.h>
#include <string.h>
+#include <corosync/coroipcc.h>
#include <corosync/corotypes.h>
#include <corosync/cpg.h>
#include "../exec/crypto.h"
@@ -114,13 +115,13 @@ int main (void)
result = cpg_initialize (&handle, &callbacks);
if (result != CS_OK) {
- printf ("Couldn't initialize CPG service %d\n", result);
+ printf ("Couldn't initialize CPG service %d : %s\n", result, coroipcc_lasterror_get());
exit (0);
}
res = cpg_join (handle, &group_name);
if (res != CS_OK) {
- printf ("cpg_join failed with result %d\n", res);
+ printf ("cpg_join failed with result %d : %s\n", res, coroipcc_lasterror_get());
exit (1);
}
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais