[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2013-09-04 Thread Leif Hedstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758195#comment-13758195
 ] 

Leif Hedstrom commented on TS-947:
--

Can this be closed ? It sounds like it can ?

 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 4.2.0

 Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch


 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2013-09-04 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758339#comment-13758339
 ] 

John Plevyak commented on TS-947:
-

Yes, this has been fixed.

john





 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 4.2.0

 Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch


 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2011-10-20 Thread Alan M. Carroll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131692#comment-13131692
 ] 

Alan M. Carroll commented on TS-947:


The timed wait fix was put on trunk, which ameliorates this problem. However, 
the mailing list consensus was that this is a hack and the queuing logic should 
be fixed to be correct. John Pleyvack has volunteered to take lead on that 
although he welcomes contributions from others.

 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
Assignee: Alan M. Carroll
 Fix For: 3.1.2

 Attachments: lock-safe-AIO.patch


 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2011-10-20 Thread Alan M. Carroll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131804#comment-13131804
 ] 

Alan M. Carroll commented on TS-947:


[Mailing list 
discussion|http://mail-archives.apache.org/mod_mbox/trafficserver-dev/201109.mbox/%3ccamkwmbeeu8uqa+ymaqvsoqgy0hajjbuevdex17eqjbog9v7...@mail.gmail.com%3E]

Key quote by John Pleyvak

{quote}So the situation is that the existing design dates from a time when
threads and scheduling for Unix were primitive.  It was not uncommon in
those
days for threads to go away for half a second, a second or more in a
loaded system.

To deal with this, the current design is non-blocking all the way down.
This means that no code (correctly written) ever blocks on a mutex (we use 
try_lock) which
is the origin of that AIO code.

Now, threading and scheduling has dramatically improved and the problems
that motivated
that design decision have (most probably :) gone away.

It would be dramatically easier to do away with this requirement and write
the code with
small critical sections and just block if we miss a mutex.  First, that is
the way most
programs are written these days and second, having to back out of the entire
stack, saving
context is a huge burden.  It makes the code hard to read, write and debug.

The upshot is that my suggestion to just add a 10 msec timeout is for the
short term.
I'd like to see a blocking, small critical section AIO/Cache/RamCache patch
performance
tested against the existing code *then* pull the trigger on such a new
design and fix the
offending code (including the AIO race).

I plan to work on such a patch and would be very interested in talking to
anyone who might be interested in such a thing as well.
{quote}

 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 3.1.2

 Attachments: lock-safe-AIO.patch


 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2011-09-19 Thread mohan_zl (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107719#comment-13107719
 ] 

mohan_zl commented on TS-947:
-

One question:
The variable sleep_wait in struct AIOThreadInfo is not used now, why not nuke 
it away or use it for Jplevyak's patch, looks like:

timespec ten_msec_timespec = ink_based_hrtime_to_timespec(ink_get_hrtime() + 
HRTIME_MSECONDS(thr_info-sleep_wait));

 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
 Fix For: 3.1.1

 Attachments: lock-safe-AIO.patch


 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2011-09-07 Thread B Wyatt (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099273#comment-13099273
 ] 

B Wyatt commented on TS-947:


I am testing a patch that removes the use of the atomic temp_list entirely.  
Instead it relies on acquiring the aio_mutex for every request enqueue.  This 
may be sub-optimal, but it is preferable to occasional soft-locking on 
initialization or request delays.

I have attached the patch.

 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt

 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira