Re: mixed usage of lock protection and lock-free List template class in thread.h
PT_S4_+182>: mov 0x10(%rax),%rax 0x18021ba0a <_Z11List_removeI13pthread_mutexEvR10fast_mutexRPT_S4_+186>: mov 0x10(%rax),%rdx 0x18021ba0e <_Z11List_removeI13pthread_mutexEvR10fast_mutexRPT_S4_+190>: mov -0x8(%rbp),%rax (gdb) x/gx $rbp-8 #cur 0xff3fca98: 0x000601e147c0 (gdb) x/gx 0x000601e147c0+0x10 #cur->next 0x601e147d0: 0x000601e147c0 #identical to cur, into infinite-loop refer to the source code in thread.h: template inline void List_remove (fast_mutex &mx, list_node *&head, list_node *node) { if (!node) return; mx.lock (); if (head) { if (InterlockedCompareExchangePointer ((PVOID volatile *) &head, node->next, node) != node) { list_node *cur = head; while (cur->next && node != cur->next) cur = cur->next; if (node == cur->next) cur->next = cur->next->next; } } mx.unlock (); } Re-paste the sample code test-thread.cpp to demo the infinite loop (I fixed the space deletion issue in my yahoo email): #include #include #include #include #include #include #include #include #include #include #include #include #include #include using namespace std; template class Queue { typedef std::deque Container; public: Queue(int _capacity = std::numeric_limits::max()) : capacity(_capacity), closed(false) {} bool enqueue(T* d) { if (closed) return false; std::unique_lock lock(mutex); if (d == 0) { closed = true; empty_cv.notify_all(); return true; } while (data.size() >= capacity) full_cv.wait(lock); data.push_back(d); empty_cv.notify_one(); return true; } T* dequeue() { std::unique_lock lock(mutex); while (data.empty() && !closed) empty_cv.wait(lock); if (data.size()) { T* d = data.front(); data.pop_front(); full_cv.notify_one(); return d; } else return 0; } size_t size() const { return data.size(); } private: std::mutex mutex; std::condition_variable full_cv, empty_cv; uint32_t capacity; bool closed; Container data; }; struct Node { int data; }; struct Job { Node* node; Queue* recycle; }; static Queue jobs; static void* handle_job(void* arg) { long ithr = (long)arg; unsigned long seed = time(0) + ithr*100; int NS = 1000; while (Job* j = jobs.dequeue()) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); j->recycle->enqueue(j->node); delete j; } } struct Task { Queue recycle; int size; // number of sub jobs Task(int N) : size(N) { for (int i = 0; idata = i; Job* job = new Job; job->node = t; job->recycle = &recycle; jobs.enqueue(job); } } ~Task() { int i = 0; while (Node* t = recycle.dequeue()) { delete t; if (++i == size) break; } } }; static string timestamp() { time_t t; struct tm tmp; char buf[80]; t = time(NULL); localtime_r(&t, &tmp); strftime(buf, sizeof(buf), "%d-%m-%Y %I:%M:%S", &tmp); return buf; } static void* create_task(void* arg) { long ithr = (long)arg; int TASK_NUM = 100; int one_percent = TASK_NUM/100; int TASK_SUB_JOB_NUM = 1; int NS = 1000; unsigned long seed = time(0) + ithr*1; int i = 0; for (; i < TASK_NUM; ++i) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); if (i%one_percent == 0) { fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent); } Task task(TASK_SUB_JOB_NUM); } fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent); } int main() { int NTHR_HANDLE_JOB = 4, NTHR_CREATE_TASK = 4; std::vector threads(NTHR_HANDLE_JOB+NTHR_CREATE_TASK); int k = 0; for (long i = 0; i < NTHR_HANDLE_JOB; ++i) { pthread_create(&threads[k++], NULL, handle_job, (void*)i); } for (long i = 0; i < NTHR_CREATE_TASK; ++i) { pthread_create(&threads[k++], NULL, create_task, (void*)i); } // wait for create_task thread for (size_t i = NTHR_HANDLE_JOB; i < threads.size(); ++i) { pthread_join(threads[i], NULL); } jobs.enqueue(0); // wait for handle_job thread for (size_t i = 0; i < NTHR_HANDLE_JOB; ++i) pthread_join(threads[i], NULL); } On F
Re: mixed usage of lock protection and lock-free List template class in thread.h
On Feb 14 07:37, Xiaofeng Liu via cygwin wrote: > Sorry, I need send again for another try for the formatting. > > > (Yahoo deleted my spaces. Please reformat the code in your C++ editor > if you want to use the code. Sorry!) You may want to use a good old MUA like thunderbird or mutt :} > Here is the sample code that will hang in cygwin: test-thread.cpp > to compile: > g++ -std=c++0x test-thread.cpp -lpthread > In this code, mutex and cond need be created and destroyed very > frequently, which could corrupt the static list object owned by some > classes in thread.h. In my test, I have a computer of 8 threads to run > cygwin, and the hang could happen when cond/mutex objects are created > and destroyed for the order of 1 millions times within a few minutes. > Is this practical? Yes. My code for my product used a lot of > std::future which use one mutex and one mutex for each object. The > future objects are created and destroyed on the flight, so are for > mutex and cond variables. I can also observe that the peak memory > kept increasing to a few hundred MB, and I suspect there is a MEMORY > LEAK in cygwin kernel. > I hope the format will be good. If not, I will try again. The leading spaces are still gone but at least the linefeeds have been retained, so all is good. Note that this is not sufficient for patch submissions. Make sure to send them as attachment as long as you're using a broken MUA. Thanks for the example. If you can come up with a patch for a lockless implementation this would be highly appreciated. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat signature.asc Description: PGP signature
Re: mixed usage of lock protection and lock-free List template class in thread.h
Sorry, I need send again for another try for the formatting. (Yahoo deleted my spaces. Please reformat the code in your C++ editor if you want to use the code. Sorry!) Here is the sample code that will hang in cygwin: test-thread.cpp to compile: g++ -std=c++0x test-thread.cpp -lpthread In this code, mutex and cond need be created and destroyed very frequently, which could corrupt the static list object owned by some classes in thread.h. In my test, I have a computer of 8 threads to run cygwin, and the hang could happen when cond/mutex objects are created and destroyed for the order of 1 millions times within a few minutes. Is this practical? Yes. My code for my product used a lot of std::future which use one mutex and one mutex for each object. The future objects are created and destroyed on the flight, so are for mutex and cond variables. I can also observe that the peak memory kept increasing to a few hundred MB, and I suspect there is a MEMORY LEAK in cygwin kernel. I hope the format will be good. If not, I will try again. Thanks. Xiaofeng -test-thread.cpp--- #include #include #include #include #include #include #include #include #include #include #include #include #include #include using namespace std; template class Queue { typedef std::deque Container; public: Queue(int _capacity = std::numeric_limits::max()) : capacity(_capacity), closed(false) {} bool enqueue(T* d) { if (closed) return false; std::unique_lock lock(mutex); if (d == 0) { closed = true; empty_cv.notify_all(); return true; } while (data.size() >= capacity) full_cv.wait(lock); data.push_back(d); empty_cv.notify_one(); return true; } T* dequeue() { std::unique_lock lock(mutex); while (data.empty() && !closed) empty_cv.wait(lock); if (data.size()) { T* d = data.front(); data.pop_front(); full_cv.notify_one(); return d; } else return 0; } size_t size() const { return data.size(); } private: std::mutex mutex; std::condition_variable full_cv, empty_cv; uint32_t capacity; bool closed; Container data; }; struct Node { int data; }; struct Job { Node* node; Queue* recycle; }; static Queue jobs; static void* handle_job(void* arg) { long ithr = (long)arg; unsigned long seed = time(0) + ithr*100; int NS = 1000; while (Job* j = jobs.dequeue()) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); j->recycle->enqueue(j->node); delete j; } } struct Task { Queue recycle; int size; // number of sub jobs Task(int N) : size(N) { for (int i = 0; idata = i; Job* job = new Job; job->node = t; job->recycle = &recycle; jobs.enqueue(job); } } ~Task() { int i = 0; while (Node* t = recycle.dequeue()) { delete t; if (++i == size) break; } } }; static string timestamp() { time_t t; struct tm tmp; char buf[80]; t = time(NULL); localtime_r(&t, &tmp); strftime(buf, sizeof(buf), "%d-%m-%Y %I:%M:%S", &tmp); return buf; } static void* create_task(void* arg) { long ithr = (long)arg; int TASK_NUM = 100; int one_percent = TASK_NUM/100; int TASK_SUB_JOB_NUM = 1; int NS = 1000; unsigned long seed = time(0) + ithr*1; int i = 0; for (; i < TASK_NUM; ++i) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); if (i%one_percent == 0) { fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent); } Task task(TASK_SUB_JOB_NUM); } fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent); } int main() { int NTHR_HANDLE_JOB = 4, NTHR_CREATE_TASK = 4; std::vector threads(NTHR_HANDLE_JOB+NTHR_CREATE_TASK); int k = 0; for (long i = 0; i < NTHR_HANDLE_JOB; ++i) { pthread_create(&threads[k++], NULL, handle_job, (void*)i); } for (long i = 0; i < NTHR_CREATE_TASK; ++i) { pthread_create(&threads[k++], NULL, create_task, (void*)i); } // wait for create_task thread for (size_t i = NTHR_HANDLE_JOB; i < threads.size(); ++i) { pthread_join(threads[i], NULL); } jobs.enqueue(0); // wait for handle_job thread for (size_t i = 0; i < NTHR_HANDLE_JOB; ++i) pthread_join(threads[i], NULL); } end of test-thread.cpp---------- To: cygwin@cygwin.com Sent: Friday, December 1, 2017 9:15 AM Subject: Re: mixed usage of lock protection and lock-free List template class in thread.h On Dec 1 16:45, Xiaofeng Liu via cygwin wrote: > Lock protection and lock-free should never be mixed ! > https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/thread.h;hb=1f42dc2bcf58d3b8629eb13d53de3f69fc314b47#l110 > > 110 template inline void 111 List_insert (list_node > *&head, list_node *node) 112 { 113
Re: mixed usage of lock protection and lock-free List template class in thread.h
Sorry I have to reformat again. Here is the sample code that will hang in cygwin: test-thread.cpp to compile: g++ -std=c++0x test-thread.cpp -lpthread #include #include #include #include #include #include #include #include #include #include #include #include #include #include using namespace std; templateclass Queue { typedef std::deque Container;public: Queue(int _capacity = std::numeric_limits::max()) : capacity(_capacity), closed(false) {} bool enqueue(T* d) { if (closed) return false; std::unique_lock lock(mutex); if (d == 0) { closed = true; empty_cv.notify_all(); return true; } while (data.size() >= capacity) full_cv.wait(lock); data.push_back(d); empty_cv.notify_one(); return true; } T* dequeue() { std::unique_lock lock(mutex); while (data.empty() && !closed) empty_cv.wait(lock); if (data.size()) { T* d = data.front(); data.pop_front(); full_cv.notify_one(); return d; } else return 0; } size_t size() const { return data.size(); } private: std::mutex mutex; std::condition_variable full_cv, empty_cv; uint32_t capacity; bool closed; Container data;}; struct Node { int data;}; struct Job { Node* node; Queue* recycle;}; static Queue jobs; static void* handle_job(void* arg){ long ithr = (long)arg; unsigned long seed = time(0) + ithr*100; int NS = 1000; while (Job* j = jobs.dequeue()) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); j->recycle->enqueue(j->node); delete j; }} struct Task { Queue recycle; int size; // number of sub jobs Task(int N) : size(N) { for (int i = 0; idata = i; Job* job = new Job; job->node = t; job->recycle = &recycle; jobs.enqueue(job); } } ~Task() { int i = 0; while (Node* t = recycle.dequeue()) { delete t; if (++i == size) break; } }}; static string timestamp(){ time_t t; struct tm tmp; char buf[80]; t = time(NULL); localtime_r(&t, &tmp); strftime(buf, sizeof(buf), "%d-%m-%Y %I:%M:%S", &tmp); return buf;} static void* create_task(void* arg){ long ithr = (long)arg; int TASK_NUM = 100; int one_percent = TASK_NUM/100; int TASK_SUB_JOB_NUM = 1; int NS = 1000; unsigned long seed = time(0) + ithr*1; int i = 0; for (; i < TASK_NUM; ++i) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); if (i%one_percent == 0) { fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent); } Task task(TASK_SUB_JOB_NUM); } fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent);} int main(){ int NTHR_HANDLE_JOB = 4, NTHR_CREATE_TASK = 4; std::vector threads(NTHR_HANDLE_JOB+NTHR_CREATE_TASK); int k = 0; for (long i = 0; i < NTHR_HANDLE_JOB; ++i) { pthread_create(&threads[k++], NULL, handle_job, (void*)i); } for (long i = 0; i < NTHR_CREATE_TASK; ++i) { pthread_create(&threads[k++], NULL, create_task, (void*)i); } // wait for create_task thread for (size_t i = NTHR_HANDLE_JOB; i < threads.size(); ++i) { pthread_join(threads[i], NULL); } jobs.enqueue(0); // wait for handle_job thread for (size_t i = 0; i < NTHR_HANDLE_JOB; ++i) pthread_join(threads[i], NULL);} In this code, mutex and cond need be created and destroyed very frequently, which could corrupt the static list object owned by some classes in thread.h. In my test, I have a computer of 8 threads to run cygwin, and the hang could happen when cond/mutex objects are created and destroyed for the order of 1 millions times within a few minutes. I can also observe that the peak memory kept increasing to a few hundred MB, and I suspect there is a MEMORY LEAK in cygwin kernel. I hope the format will be good. If not, I will try again. Thanks. Xiaofeng From: Corinna Vinschen To: cygwin@cygwin.com Sent: Friday, December 1, 2017 9:15 AM Subject: Re: mixed usage of lock protection and lock-free List template class in thread.h On Dec 1 16:45, Xiaofeng Liu via cygwin wrote: > Lock protection and lock-free should never be mixed ! > https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/thread.h;hb=1f42dc2bcf58d3b8629eb13d53de3f69fc314b47#l110 > > 110 template inline void 111 List_insert (list_node > *&head, list_node *node) 112 { 113 if (!node) 114 retur
Re: mixed usage of lock protection and lock-free List template class in thread.h
Here is the sample code that will hang in cygwin: test-thread.cpp to compile: g++ -std=c++0x test-thread.cpp -lpthread #include #include #include #include #include #include #include #include #include #include #include #include #include #include using namespace std; templateclass Queue { typedef std::deque Container;public: Queue(int _capacity = std::numeric_limits::max()) : capacity(_capacity), closed(false) {} bool enqueue(T* d) { if (closed) return false; std::unique_lock lock(mutex); if (d == 0) { closed = true; empty_cv.notify_all(); return true; } while (data.size() >= capacity) full_cv.wait(lock); data.push_back(d); empty_cv.notify_one(); return true; } T* dequeue() { std::unique_lock lock(mutex); while (data.empty() && !closed) empty_cv.wait(lock); if (data.size()) { T* d = data.front(); data.pop_front(); full_cv.notify_one(); return d; } else return 0; } size_t size() const { return data.size(); } private: std::mutex mutex; std::condition_variable full_cv, empty_cv; uint32_t capacity; bool closed; Container data;}; struct Node { int data;}; struct Job { Node* node; Queue* recycle;}; static Queue jobs; static void* handle_job(void* arg){ long ithr = (long)arg; unsigned long seed = time(0) + ithr*100; int NS = 1000; while (Job* j = jobs.dequeue()) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); j->recycle->enqueue(j->node); delete j; }} struct Task { Queue recycle; int size; // number of sub jobs Task(int N) : size(N) { for (int i = 0; idata = i; Job* job = new Job; job->node = t; job->recycle = &recycle; jobs.enqueue(job); } } ~Task() { int i = 0; while (Node* t = recycle.dequeue()) { delete t; if (++i == size) break; } }}; static string timestamp(){ time_t t; struct tm tmp; char buf[80]; t = time(NULL); localtime_r(&t, &tmp); strftime(buf, sizeof(buf), "%d-%m-%Y %I:%M:%S", &tmp); return buf;} static void* create_task(void* arg){ long ithr = (long)arg; int TASK_NUM = 100; int one_percent = TASK_NUM/100; int TASK_SUB_JOB_NUM = 1; int NS = 1000; unsigned long seed = time(0) + ithr*1; int i = 0; for (; i < TASK_NUM; ++i) { struct timespec ts; ts.tv_sec = 0; seed = seed * 1103515245 + 12345; ts.tv_nsec = seed%NS; nanosleep(&ts, 0); if (i%one_percent == 0) { fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent); } Task task(TASK_SUB_JOB_NUM); } fprintf(stderr, "%s: create_task[%d]: %d%% done\n", timestamp().c_str(), ithr, i/one_percent);} int main(){ int NTHR_HANDLE_JOB = 4, NTHR_CREATE_TASK = 4; std::vector threads(NTHR_HANDLE_JOB+NTHR_CREATE_TASK); int k = 0; for (long i = 0; i < NTHR_HANDLE_JOB; ++i) { pthread_create(&threads[k++], NULL, handle_job, (void*)i); } for (long i = 0; i < NTHR_CREATE_TASK; ++i) { pthread_create(&threads[k++], NULL, create_task, (void*)i); } // wait for create_task thread for (size_t i = NTHR_HANDLE_JOB; i < threads.size(); ++i) { pthread_join(threads[i], NULL); } jobs.enqueue(0); // wait for handle_job thread for (size_t i = 0; i < NTHR_HANDLE_JOB; ++i) pthread_join(threads[i], NULL);} In this code, mutex and cond need be created and destroyed very frequently, which could corrupt the static list object owned by some classes in thread.h. In my test, I have a computer of 8 threads to run cygwin, and the hang could happen when cond/mutex objects are created and destroyed for the order of 1 millions times within a few minutes. I can also observe that the peak memory kept increasing to a few hundred MB, and I suspect there is a MEMORY LEAK in cygwin kernel. I hope the format will be good. If not, I will try again. Thanks. From: Corinna Vinschen To: cygwin@cygwin.com Sent: Friday, December 1, 2017 9:15 AM Subject: Re: mixed usage of lock protection and lock-free List template class in thread.h On Dec 1 16:45, Xiaofeng Liu via cygwin wrote: > Lock protection and lock-free should never be mixed ! > https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/thread.h;hb=1f42dc2bcf58d3b8629eb13d53de3f69fc314b47#l110 > > 110 template inline void 111 List_insert (list_node > *&head, list_node *node) 112 { 113 if (!node) 114 return; 115 do 116 > node->next = head; 117 while (Int
Re: mixed usage of lock protection and lock-free List template class in thread.h
On Dec 1 16:45, Xiaofeng Liu via cygwin wrote: > Lock protection and lock-free should never be mixed ! > https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/thread.h;hb=1f42dc2bcf58d3b8629eb13d53de3f69fc314b47#l110 > > 110 template inline void 111 List_insert (list_node > *&head, list_node *node) 112 { 113 if (!node) 114 return; 115 do 116 > node->next = head; 117 while (InterlockedCompareExchangePointer ((PVOID > volatile *) &head, 118 node, > node->next) != node->next); 119 } 120 121 template inline > void 122 List_remove (fast_mutex &mx, list_node *&head, list_node *node) 123 > { 124 if (!node) 125 return; 126 mx.lock (); 127 if (head) 128 > { 129 if (InterlockedCompareExchangePointer ((PVOID volatile *) &head, > 130 node->next, node) != node) > 131 { 132 list_node *cur = head; 133 134 while > (cur->next && node != cur->next) 135 cur = cur->next; 136 > if (node == cur->next) 137 cur->next = cur->next->next; 138 > } 139 } 140 mx.unlock (); 141 } > The symptom I met is a job hang with the following stack: > #0 0x7711c2ea in ntdll!ZwWaitForMultipleObjects () from > /cygdrive/c/Windows/SYSTEM32/ntdll.dll > #1 0x07fefd111430 in KERNELBASE!GetCurrentProcess () from > /cygdrive/c/Windows/system32/KERNELBASE.dll > #2 0x76fc06c0 in WaitForMultipleObjects () from > /cygdrive/c/Windows/system32/kernel32.dll > #3 0x0001800458ac in cygwait(void*, _LARGE_INTEGER*, unsigned int) () > from /usr/bin/cygwin1.dll > #4 0x00018013d029 in pthread_cond::~pthread_cond() () from > /usr/bin/cygwin1.dll > #5 0x00018013d0dd in pthread_cond::~pthread_cond() () from > /usr/bin/cygwin1.dll > #6 0x000180141196 in pthread_cond_destroy () from /usr/bin/cygwin1.dll > #7 0x000180116d5b in _sigfe () from /usr/bin/cygwin1.dll > #8 0x000100908e38 in > std::_Sp_counted_ptr_inplace ()>, std::allocator, void ()>, std::allocator, > (__gnu_cxx::_Lock_policy)2>::_M_dispose() () > The problem with the current implementation for concurrent insert and delete > is explained at WikipediaNon-blocking linked list > > My question is how to solve this ? Adding lock protection in List_insert > (removing lock-freee) or implementing a complete lock-free List based on > Harris's solution to use two CAS? First of all, please, please, please fix your MUA! Just like your mails to cygwin-patches a couple of weeks ago, your mails are pretty much unreadable due to broken line wrapping. Back to business: This code is working since 2003. So, is that just a theoretical problem, or a practical one? If the latter, what is broken exactly? However, since you're asking, a lockless implementation where appropriate is always welcome. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat signature.asc Description: PGP signature