Steven Dake a écrit :
On Tue, 2010-03-30 at 09:57 +0200, Jerome Flesch wrote:
Patch looks good for commit

Committed.


responses to questions inline

On Mon, 2010-03-29 at 11:10 +0200, Jerome Flesch wrote:
Hello,

I just had a weird case where SaClmInitialize() returned the value 0 to me. Since this value isn't valid for a enum SaAisErrorT, I investigated and I found out that corosync/coroipcc.c:coroipcc_service_connect() can return 0 in case Corosync was compiled with _POSIX_THREAD_PROCESS_SHARED <= 0 (ie on a FreeBSD system) when there is a problem with the IPC semaphores. This case is actually likely to happen on a FreeBSD system due to IPC default limits.

In the process, I also changed the way the variable 'res' was used: Since it's an enum, I've assumed it doesn't really make sense to use it to store return values of system calls, so I replaced it by sys_res.

Also, I was wondering if there could be a way to get more details when coroipcc_service_connect() fails. CS_ERR_LIBRARY and CS_ERR_TRY_AGAIN are not exactly what I would call meaningful error codes ... It's not very handy to have to try to guess if it failed because permissions on the Unix socket are wrong or because IPC limits have been reached or because an upper layer failed for some reason. I assume adding error codes like CS_ERR_SEM, CS_ERR_SHM, could be an option but letting such level of implementation details be visible in the error codes is not a valid option, right ?

yes i prefer to have less error codes with no implementation details.

Right now, when I have to diagnose this kind of issue, I'm running a patched version of Flatiron including the following macro (in lib/util.h):

+#ifdef DEBUG
+#define DPRINTX(...) do {                                            \
+     fprintf(stderr, "Corosync lib: %s:L%d: ", __FILE__, __LINE__);  \
+     fprintf(stderr, __VA_ARGS__);                                   \
+     fprintf(stderr, "\n");                                          \
+   } while (0)
+#else
+#define DPRINTX(...)
+#endif

Is there a more adequate way of getting debug information from the libraries themselves ?

I was thinking of a slightly more elegant way of doing it. I could add functions like for example: 'void coroipcc_set_lasterror(const char *str)', 'const char *coroipcc_get_lasterror(void)' ? Using pthread_setspecific(), it would be possible to make it thread-safe. Do you think it would be worth I spend some time on making a patch for that ?

sounds interesting.  I wouldn't mind some diagnostic capability coming
out of the apis, but the thread safety was a problem I didn't know how
to solve.  After reading the man page for pthread_setspecific, it looks
like it might solve the problem.

I would prefer apis something like this in coroipcc.c
static void coroipcc_lasterror_set (const char *fmt, ...);

It makes sense. It would allow to get more details like for instance strerror(errno).


and in coroipcc.h
extern const char *coroipcc_lasterror_get (void).

Curious how you plan to create the key and garbage collect the
pthread_key values...

It's pretty simple actually: pthread_key_create() allows to attach a destructor to a pthread key. This destructor is called each time a thread stops, with as argument the pointer stored with pthread_setspecific for this thread. So if we store a simple buffer with pthread_setspecific, we can simply attach 'free()' to the key :)



How do you pthread_key_create?  Use pthread_once in each api call?  I
want to avoid a global init function for apis.  pthread_once may have
performance implications if it takes a mutex each time it runs.  Another
option is to use a constructor section in the library.

Like to see a patch..

And here it is. Of course, it's more a draft than a definitive patch. Also, it would be nice to know if we go for this kind of implementation or if we go for Angus' idea using a callback.

FYI, with this patch, when I run test/cpgverify without having a Corosync running, I get: % ./cpgverify

Couldn't initialize CPG service 6 : coroipcc.c, L750: connect() failed (61: Connection refused)


Regards
-steve

This could have good implications for the other apis beyond coroipcc...

Regards
-steve

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais


diff --git a/include/corosync/coroipcc.h b/include/corosync/coroipcc.h
index bc912ca..fb7c940 100644
--- a/include/corosync/coroipcc.h
+++ b/include/corosync/coroipcc.h
@@ -117,6 +117,8 @@ coroipcc_zcb_msg_send_reply_receive (
 	void *res_msg,
 	size_t res_len);
 
+extern const char *coroipcc_lasterror_get(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/coroipcc.c b/lib/coroipcc.c
index 06e4d75..825bf60 100644
--- a/lib/coroipcc.c
+++ b/lib/coroipcc.c
@@ -37,6 +37,7 @@
 
 #include <config.h>
 
+#include <stdarg.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <unistd.h>
@@ -116,6 +117,67 @@ static void socket_nosigpipe(int s)
 #define MSG_NOSIGNAL 0
 #endif
 
+#define LASTERROR_BUFFER_SIZE 512
+
+static pthread_once_t once_control = PTHREAD_ONCE_INIT;
+static pthread_key_t lasterror_key;
+
+static void lasterror_init (void)
+{
+	pthread_key_create (&lasterror_key, free);
+}
+
+static inline char *coroipcc_lasterror_get_buffer (void)
+{
+	if (pthread_once (&once_control, lasterror_init) < 0)
+		return NULL;
+	return pthread_getspecific (lasterror_key);
+}
+
+static void _coroipcc_lasterror_set (const char *file, int line, int err, const char *fmt, ...) \
+				__attribute__ ((format (printf, 4, 5)));
+#define coroipcc_lasterror_set(...) _coroipcc_lasterror_set (__FILE__, __LINE__, 0, __VA_ARGS__)
+#define coroipcc_lasterror_perror(...) _coroipcc_lasterror_set (__FILE__, __LINE__, errno, __VA_ARGS__)
+
+static void _coroipcc_lasterror_set (const char *file, int line, int err, const char *fmt, ...)
+{
+	va_list ap;
+	char *buffer;
+	int len;
+
+	buffer = coroipcc_lasterror_get_buffer ();
+	if (buffer == NULL)
+	{
+		buffer = malloc (LASTERROR_BUFFER_SIZE);
+		if (buffer == NULL || pthread_setspecific (lasterror_key, buffer) < 0)
+		{
+			free (buffer);
+			return;
+		}
+		buffer[0] = '\0';
+	}
+
+	snprintf (buffer, LASTERROR_BUFFER_SIZE, "%s, L%u: ", file, line);
+	len = strlen (buffer);
+
+	va_start (ap, fmt);
+	vsnprintf (buffer + len, LASTERROR_BUFFER_SIZE - len, fmt, ap);
+	va_end (ap);
+
+	if (err > 0)
+	{
+		char strerr[128];
+		strerror_r (err, strerr, sizeof(strerr));
+		len = strlen (buffer);
+		snprintf (buffer + len, LASTERROR_BUFFER_SIZE - len, " (%u: %s)", errno, strerr);
+	}
+}
+
+const char *coroipcc_lasterror_get (void)
+{
+	return coroipcc_lasterror_get_buffer ();
+}
+
 static cs_error_t
 socket_send (
 	int s,
@@ -157,6 +219,7 @@ retry_send:
 			goto retry_send;
 			break;
 		default:
+			coroipcc_lasterror_perror ("sendmsg() failed");
 			res = CS_ERR_LIBRARY;
 			goto res_exit;
 		}
@@ -214,6 +277,7 @@ retry_recv:
 			goto retry_recv;
 			break;
 		default:
+			coroipcc_lasterror_perror ("recvmsg() failed");
 			res = CS_ERR_LIBRARY;
 			goto res_exit;
 		}
@@ -223,6 +287,7 @@ retry_recv:
 	 * EOF is detected when recvmsg return 0.
 	 */
 	if (result == 0) {
+		coroipcc_lasterror_set ("recvmsg(): BSD EOF");
 		res = CS_ERR_LIBRARY;
 		goto res_exit;
 	}
@@ -295,6 +360,7 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
 		sprintf (path, LOCALSTATEDIR "/run/%s", file);
 		fd = mkstemp (path);
 		if (fd == -1) {
+			coroipcc_lasterror_perror ("mkstemp(%s) failed", path);
 			return (-1);
 		}
 	}
@@ -305,6 +371,7 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
 		MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 
 	if (addr_orig == MAP_FAILED) {
+		coroipcc_lasterror_perror ("mmap() failed");
 		return (-1);
 	}
 
@@ -312,6 +379,11 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
 		MAP_FIXED | MAP_SHARED, fd, 0);
 
 	if (addr != addr_orig) {
+		if (addr == MAP_FAILED)
+			coroipcc_lasterror_perror ("mmap() failed");
+		else
+			coroipcc_lasterror_set ("mmap() didn't map the memory at the expected address (%p instead of %p)",
+					addr, addr_orig);
 		return (-1);
 	}
 #ifdef COROSYNC_BSD
@@ -327,6 +399,7 @@ circular_memory_map (char *path, const char *file, void **buf, size_t bytes)
 
 	res = close (fd);
 	if (res) {
+		coroipcc_lasterror_perror("close() failed");
 		return (-1);
 	}
 	*buf = addr_orig;
@@ -367,6 +440,7 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
 		sprintf (path, LOCALSTATEDIR "/run/%s", file);
 		fd = mkstemp (path);
 		if (fd == -1) {
+			coroipcc_lasterror_perror ("mkstemp(%s) failed", path);
 			return (-1);
 		}
 	}
@@ -377,6 +451,7 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
 		MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
 
 	if (addr_orig == MAP_FAILED) {
+		coroipcc_lasterror_perror ("mmap() failed");
 		return (-1);
 	}
 
@@ -384,6 +459,11 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
 		MAP_FIXED | MAP_SHARED, fd, 0);
 
 	if (addr != addr_orig) {
+		if (addr == MAP_FAILED)
+			coroipcc_lasterror_perror ("mmap() failed");
+		else
+			coroipcc_lasterror_set ("mmap() didn't map the memory at the expected address (%p instead of %p)",
+					addr, addr_orig);
 		return (-1);
 	}
 #ifdef COROSYNC_BSD
@@ -392,6 +472,7 @@ memory_map (char *path, const char *file, void **buf, size_t bytes)
 
 	res = close (fd);
 	if (res) {
+		coroipcc_lasterror_perror ("close() failed");
 		return (-1);
 	}
 	*buf = addr_orig;
@@ -415,6 +496,9 @@ msg_send (
 	for (i = 0; i < iov_len; i++) {
 		if ((req_buffer_idx + iov[i].iov_len) > 
 			ipc_instance->request_size) {
+			coroipcc_lasterror_set ("req_buffer_idx (%u) + iov[i].iov_len (%u)"
+					" > ipc_instance->request_size (%u)",
+					req_buffer_idx, iov[i].iov_len, ipc_instance->request_size);
 			return (CS_ERR_INVALID_PARAM);
 		}
 		memcpy (&ipc_instance->request_buffer[req_buffer_idx],
@@ -426,6 +510,7 @@ msg_send (
 #if _POSIX_THREAD_PROCESS_SHARED > 0
 	res = sem_post (&ipc_instance->control_buffer->sem0);
 	if (res == -1) {
+		coroipcc_lasterror_perror ("sem_post() failed");
 		return (CS_ERR_LIBRARY);
 	}
 #else 
@@ -447,6 +532,7 @@ retry_semop:
 		goto retry_semop;
 	} else
 	if (res == -1) {
+		coroipcc_lasterror_perror ("semop() failed");
 		return (CS_ERR_LIBRARY);
 	}
 #endif
@@ -480,6 +566,8 @@ retry_semwait:
 
 		poll (&pfd, 1, 0);
 		if (pfd.revents == POLLERR || pfd.revents == POLLHUP) {
+			coroipcc_lasterror_set ("EOF: poll() returned %s",
+					(pfd.revents == POLLERR ? "POLLERR" : "POLLHUP"));
 			return (CS_ERR_LIBRARY);
 		}
 
@@ -508,6 +596,7 @@ retry_semop:
 		goto retry_semop;
 	} else
 	if (res == -1) {
+		coroipcc_lasterror_perror ("semop() failed");
 		return (CS_ERR_LIBRARY);
 	}
 #endif
@@ -546,6 +635,8 @@ retry_semwait:
 
 		poll (&pfd, 1, 0);
 		if (pfd.revents == POLLERR || pfd.revents == POLLHUP) {
+			coroipcc_lasterror_set ("poll() returned %s",
+					(pfd.revents == POLLERR ? "POLLERR" : "POLLHUP"));
 			return (CS_ERR_LIBRARY);
 		}
 
@@ -574,6 +665,7 @@ retry_semop:
 		goto retry_semop;
 	} else
 	if (res == -1) {
+		coroipcc_lasterror_perror ("semop() failed");
 		return (CS_ERR_LIBRARY);
 	}
 #endif
@@ -630,6 +722,7 @@ coroipcc_service_connect (
 	request_fd = socket (PF_LOCAL, SOCK_STREAM, 0);
 #endif
 	if (request_fd == -1) {
+		coroipcc_lasterror_perror ("socket() failed");
 		return (CS_ERR_LIBRARY);
 	}
 #ifdef SO_NOSIGPIPE
@@ -717,6 +810,7 @@ coroipcc_service_connect (
 		 * an existing shared memory segment for which we have access
 		 */
 		if (errno != EEXIST && errno != EACCES) {
+			coroipcc_lasterror_perror ("semget() failed");
 			res = CS_ERR_LIBRARY;
 			goto error_exit;
 		}
@@ -725,12 +819,14 @@ coroipcc_service_connect (
 	semun.val = 0;
 	sys_res = semctl (ipc_instance->semid, 0, SETVAL, semun);
 	if (sys_res != 0) {
+		coroipcc_lasterror_perror ("semctl() failed");
 		res = CS_ERR_LIBRARY;
 		goto error_exit;
 	}
 
 	sys_res = semctl (ipc_instance->semid, 1, SETVAL, semun);
 	if (sys_res != 0) {
+		coroipcc_lasterror_perror ("semctl() failed");
 		res = CS_ERR_LIBRARY;
 		goto error_exit;
 	}
@@ -811,6 +907,7 @@ coroipcc_service_disconnect (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 
@@ -831,6 +928,7 @@ coroipcc_dispatch_flow_control_get (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 
@@ -850,6 +948,7 @@ coroipcc_fd_get (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 
@@ -876,6 +975,7 @@ coroipcc_dispatch_get (
 
 	error = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (error != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (error);
 	}
 
@@ -891,6 +991,7 @@ coroipcc_dispatch_get (
 		goto error_put;
 	} else
 	if (poll_events == -1) {
+		coroipcc_lasterror_perror ("poll() failed");
 		error = CS_ERR_LIBRARY;
 		goto error_put;
 	} else
@@ -899,6 +1000,8 @@ coroipcc_dispatch_get (
 		goto error_put;
 	}
 	if (poll_events == 1 && (ufds.revents & (POLLERR|POLLHUP))) {
+		coroipcc_lasterror_set ("poll() returned %s",
+				(ufds.revents == POLLHUP ? "POLLHUP" : "POLLERR"));
 		error = CS_ERR_LIBRARY;
 		goto error_put;
 	}
@@ -909,6 +1012,7 @@ coroipcc_dispatch_get (
 		goto error_put;
 	} else
 	if (res == -1) {
+		coroipcc_lasterror_perror ("recv() failed");
 		error = CS_ERR_LIBRARY;
 		goto error_put;
 	} else
@@ -917,6 +1021,7 @@ coroipcc_dispatch_get (
 		 * happen only on BSD and Darwing systems since poll() returns a
 		 * POLLHUP event on other systems.
 		 */
+		coroipcc_lasterror_perror ("recv(): BSD EOF");
 		error = CS_ERR_LIBRARY;
 		goto error_put;
 	}
@@ -971,6 +1076,7 @@ coroipcc_dispatch_put (hdb_handle_t handle)
 
 	res = hdb_error_to_cs (hdb_handle_get_always (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handlet_get_always() failed");
 		return (res);
 	}
 #if _POSIX_THREAD_PROCESS_SHARED > 0
@@ -994,6 +1100,7 @@ retry_semop:
 		goto retry_semop;
 	} else
 	if (res == -1) {
+		coroipcc_lasterror_perror ("semop() failed");
 		res = CS_ERR_LIBRARY;
 		goto error_exit;
 	}
@@ -1032,6 +1139,7 @@ coroipcc_msg_send_reply_receive (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 
@@ -1063,6 +1171,7 @@ coroipcc_msg_send_reply_receive_in_buf_get (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 
@@ -1090,6 +1199,7 @@ coroipcc_msg_send_reply_receive_in_buf_put (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 	hdb_handle_put (&ipc_hdb, handle);
@@ -1117,6 +1227,7 @@ coroipcc_zcb_alloc (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 	map_size = size + header_size + sizeof (struct coroipcs_zc_header);
@@ -1161,6 +1272,7 @@ coroipcc_zcb_free (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 
@@ -1201,6 +1313,7 @@ coroipcc_zcb_msg_send_reply_receive (
 
 	res = hdb_error_to_cs (hdb_handle_get (&ipc_hdb, handle, (void **)&ipc_instance));
 	if (res != CS_OK) {
+		coroipcc_lasterror_set ("hdb_handle_get() failed");
 		return (res);
 	}
 	hdr = (struct coroipcs_zc_header *)(((char *)msg) - sizeof (struct coroipcs_zc_header));
diff --git a/test/cpgverify.c b/test/cpgverify.c
index c7b400a..b0fef5a 100644
--- a/test/cpgverify.c
+++ b/test/cpgverify.c
@@ -41,6 +41,7 @@
 #include <arpa/inet.h>
 #include <errno.h>
 #include <string.h>
+#include <corosync/coroipcc.h>
 #include <corosync/corotypes.h>
 #include <corosync/cpg.h>
 #include "../exec/crypto.h"
@@ -114,13 +115,13 @@ int main (void)
 
 	result = cpg_initialize (&handle, &callbacks);
 	if (result != CS_OK) {
-		printf ("Couldn't initialize CPG service %d\n", result);
+		printf ("Couldn't initialize CPG service %d : %s\n", result, coroipcc_lasterror_get());
 		exit (0);
 	}
 
         res = cpg_join (handle, &group_name);
         if (res != CS_OK) {
-                printf ("cpg_join failed with result %d\n", res);
+                printf ("cpg_join failed with result %d : %s\n", res, coroipcc_lasterror_get());
                 exit (1);
         }
 
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to