guy keren wrote: > Ulrich Windl wrote: >> On 3 Nov 2009 at 2:54, guy keren wrote: >> >>> Hi, >>> >>> the logging code in open-iscsi uses a "logarea" structure in shared >>> memory protected by a SysV semaphore (using semop system calls) - and >>> also places the sembuf array structure used in the semop calls in this >>> same shared-memory area (i.e. inside the logarea struct that is >>> allocated in shared memory at function logarea_init). >>> >>> as a result, both the iscsid logging process and the iscsid control >>> process attempt to use this structure in a non-synchronized manner, >> The fact that the logging and the control process use the shared memory in a >> unsynchronized way seems unrelated to the fact that both structures are >> located in >> the same memory area, or I didn't understand your statement. For performance >> reasons it seems wise to locate the controlling semaphores close to the area >> being >> controlled. >> >>> which is racy and may result either a deadlock or data corruption (we >>> saw these deadlocks several times). >>> >>> the relevant code of the logging process is in usr/log.c, function >>> log_flush(). the relevant code of the control process is in the same >>> file, function dolog(). >>> >>> the deadlock senario: >>> >>> 1. the logging process has the semaphore held. the control process is >>> doing some work. >>> 2. the logging process is about to release the semaphore. it sets >>> the sem_op parameter in the sembuf structure to '1'. >>> 3. the control process now wants to add a logging record. it sets >>> the sem_op parameter in the sembuf structure to '-1'. >> Ah, I understand: not the semaphore structure is in shared memory, but the >> parameter structure for calling the semop(). OK, that's bad. Probably those >> structures should be local (on the stack). I was confused with POSIX >> semaphores >> where shared memory is required. >> >>> 4. the control process invokes semop and gets blocked (because the >>> semaphore is held by the logging process). >>> 5. the logging process invokes semop and also gets blocked for the >>> same reason. >>> >>> we're in deadlock. >> Good spotting! >> >> Ulrich >> >>> to get a data corruption, we'll need a slightly different scheduling - >>> i.e. that the process that wants to take the lock will update the sembuf >>> struct first - and then the process releasing the lock would modify >>> sem_op to '1' - and we'll have both processes increasing the semaphore's >>> value instead of one increasing and one decreasing it - and thus the >>> semaphore's value will later allow both processes to grab the semaphore >>> at the same time. >>> >>> solution: >>> >>> to solve this, i have created a local variable on the stack of both >>> processes, that is used with the semop calls. another possible solution >>> is to move the sembuf structure to a global variable that is NOT placed >>> in shared memory. >>> >>> before i send a patch - is there a preference either way? or some other way? >>> >>> thanks, >>> --guy >>> > > (finally) attached is the patch: > > each process must have its own semarg structure - or they step on each > others' toes - which could cause either deadlocks or smearing of the > shared memory protected by the semaphore. >
Sorry for the late reply. Looks ok. Merged in 3300b26934848cb674390d86f09a91edf2c71980. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.