Ulrich Windl wrote:
> On 3 Nov 2009 at 2:54, guy keren wrote:
> 
>>
>> Hi,
>>
>> the logging code in open-iscsi uses a "logarea" structure in shared 
>> memory protected by a SysV semaphore (using semop system calls) - and 
>> also places the sembuf array structure used in the semop calls in this 
>> same shared-memory area (i.e. inside the logarea struct that is 
>> allocated in shared memory at function logarea_init).
>>
>> as a result, both the iscsid logging process and the iscsid control 
>> process attempt to use this structure in a non-synchronized manner, 
> 
> The fact that the logging and the control process use the shared memory in a 
> unsynchronized way seems unrelated to the fact that both structures are 
> located in 
> the same memory area, or I didn't understand your statement. For performance 
> reasons it seems wise to locate the controlling semaphores close to the area 
> being 
> controlled.
> 
>> which is racy and may result either a deadlock or data corruption (we 
>> saw these deadlocks several times).
>>
>> the relevant code of the logging process is in usr/log.c, function 
>> log_flush(). the relevant code of the control process is in the same 
>> file, function dolog().
>>
>> the deadlock senario:
>>
>>     1. the logging process has the semaphore held. the control process is
>>        doing some work.
>>     2. the logging process is about to release the semaphore. it sets 
>> the sem_op parameter in the sembuf structure to '1'.
>>     3. the control process now wants to add a logging record. it sets 
>> the sem_op parameter in the sembuf structure to '-1'.
> 
> Ah, I understand: not the semaphore structure is in shared memory, but the 
> parameter structure for calling the semop(). OK, that's bad. Probably those 
> structures should be local (on the stack). I was confused with POSIX 
> semaphores 
> where shared memory is required.
> 
>>     4. the control process invokes semop and gets blocked (because the 
>> semaphore is held by the logging process).
>>     5. the logging process invokes semop and also gets blocked for the 
>> same reason.
>>
>>     we're in deadlock.
> 
> Good spotting!
> 
> Ulrich
> 
>> to get a data corruption, we'll need a slightly different scheduling - 
>> i.e. that the process that wants to take the lock will update the sembuf 
>> struct first - and then the process releasing the lock would modify 
>> sem_op to '1' - and we'll have both processes increasing the semaphore's 
>> value instead of one increasing and one decreasing it - and thus the 
>> semaphore's value will later allow both processes to grab the semaphore 
>> at the same time.
>>
>> solution:
>>
>> to solve this, i have created a local variable on the stack of both 
>> processes, that is used with the semop calls. another possible solution 
>> is to move the sembuf structure to a global variable that is NOT placed 
>> in shared memory.
>>
>> before i send a patch - is there a preference either way? or some other way?
>>
>> thanks,
>> --guy
>>

(finally) attached is the patch:

each process must have its own semarg structure - or they step on each 
others' toes - which could cause either deadlocks or smearing of the 
shared memory protected by the semaphore.

Signed-off-by: guy keren <c...@actcom.co.il>

--guy

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.


[PATCH 1/3] do not use a semarg in shared-mem for semop calls

each process must have its own semarg structure - or they step on each others'
toes - which could cause either deadlocks or smearing of the shared memory
protected by the semaphore.

Signed-off-by: guy keren <c...@actcom.co.il>
---
 usr/log.c |   26 ++++++++++++++++++--------
 1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/usr/log.c b/usr/log.c
index 4351456..c84ddd9 100644
--- a/usr/log.c
+++ b/usr/log.c
@@ -261,16 +261,21 @@ static void log_syslog (void * buff)
 static void dolog(int prio, const char *fmt, va_list ap)
 {
 	if (log_daemon) {
-		la->ops[0].sem_op = -1;
-		if (semop(la->semid, la->ops, 1) < 0) {
+		struct sembuf ops[1];
+
+		ops[0].sem_num = la->ops[0].sem_num;
+		ops[0].sem_flg = la->ops[0].sem_flg;
+
+		ops[0].sem_op = -1;
+		if (semop(la->semid, ops, 1) < 0) {
 			syslog(LOG_ERR, "semop up failed %d", errno);
 			return;
 		}
 
 		log_enqueue(prio, fmt, ap);
 
-		la->ops[0].sem_op = 1;
-		if (semop(la->semid, la->ops, 1) < 0) {
+		ops[0].sem_op = 1;
+		if (semop(la->semid, ops, 1) < 0) {
 			syslog(LOG_ERR, "semop down failed");
 			return;
 		}
@@ -345,16 +350,21 @@ static void __dump_char(int level, unsigned char *buf, int *cp, int ch)
 static void log_flush(void)
 {
 	int msglen;
+	struct sembuf ops[1];
+
+	ops[0].sem_num = la->ops[0].sem_num;
+	ops[0].sem_flg = la->ops[0].sem_flg;
+
 
 	while (!la->empty) {
-		la->ops[0].sem_op = -1;
-		if (semop(la->semid, la->ops, 1) < 0) {
+		ops[0].sem_op = -1;
+		if (semop(la->semid, ops, 1) < 0) {
 			syslog(LOG_ERR, "semop up failed %d", errno);
 			exit(1);
 		}
 		msglen = log_dequeue(la->buff);
-		la->ops[0].sem_op = 1;
-		if (semop(la->semid, la->ops, 1) < 0) {
+		ops[0].sem_op = 1;
+		if (semop(la->semid, ops, 1) < 0) {
 			syslog(LOG_ERR, "semop down failed");
 			exit(1);
 		}
-- 
1.6.3.3

Reply via email to