guy keren wrote:
> Ulrich Windl wrote:
>> On 3 Nov 2009 at 2:54, guy keren wrote:
>>
>>> Hi,
>>>
>>> the logging code in open-iscsi uses a "logarea" structure in shared 
>>> memory protected by a SysV semaphore (using semop system calls) - and 
>>> also places the sembuf array structure used in the semop calls in this 
>>> same shared-memory area (i.e. inside the logarea struct that is 
>>> allocated in shared memory at function logarea_init).
>>>
>>> as a result, both the iscsid logging process and the iscsid control 
>>> process attempt to use this structure in a non-synchronized manner, 
>> The fact that the logging and the control process use the shared memory in a 
>> unsynchronized way seems unrelated to the fact that both structures are 
>> located in 
>> the same memory area, or I didn't understand your statement. For performance 
>> reasons it seems wise to locate the controlling semaphores close to the area 
>> being 
>> controlled.
>>
>>> which is racy and may result either a deadlock or data corruption (we 
>>> saw these deadlocks several times).
>>>
>>> the relevant code of the logging process is in usr/log.c, function 
>>> log_flush(). the relevant code of the control process is in the same 
>>> file, function dolog().
>>>
>>> the deadlock senario:
>>>
>>>     1. the logging process has the semaphore held. the control process is
>>>        doing some work.
>>>     2. the logging process is about to release the semaphore. it sets 
>>> the sem_op parameter in the sembuf structure to '1'.
>>>     3. the control process now wants to add a logging record. it sets 
>>> the sem_op parameter in the sembuf structure to '-1'.
>> Ah, I understand: not the semaphore structure is in shared memory, but the 
>> parameter structure for calling the semop(). OK, that's bad. Probably those 
>> structures should be local (on the stack). I was confused with POSIX 
>> semaphores 
>> where shared memory is required.
>>
>>>     4. the control process invokes semop and gets blocked (because the 
>>> semaphore is held by the logging process).
>>>     5. the logging process invokes semop and also gets blocked for the 
>>> same reason.
>>>
>>>     we're in deadlock.
>> Good spotting!
>>
>> Ulrich
>>
>>> to get a data corruption, we'll need a slightly different scheduling - 
>>> i.e. that the process that wants to take the lock will update the sembuf 
>>> struct first - and then the process releasing the lock would modify 
>>> sem_op to '1' - and we'll have both processes increasing the semaphore's 
>>> value instead of one increasing and one decreasing it - and thus the 
>>> semaphore's value will later allow both processes to grab the semaphore 
>>> at the same time.
>>>
>>> solution:
>>>
>>> to solve this, i have created a local variable on the stack of both 
>>> processes, that is used with the semop calls. another possible solution 
>>> is to move the sembuf structure to a global variable that is NOT placed 
>>> in shared memory.
>>>
>>> before i send a patch - is there a preference either way? or some other way?
>>>
>>> thanks,
>>> --guy
>>>
> 
> (finally) attached is the patch:
> 
> each process must have its own semarg structure - or they step on each 
> others' toes - which could cause either deadlocks or smearing of the 
> shared memory protected by the semaphore.
> 

Sorry for the late reply. Looks ok. Merged in 
3300b26934848cb674390d86f09a91edf2c71980.

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.


Reply via email to