Re: STDCXX-1066 [was: Re: STDCXX forks]
On Mon, Sep 24, 2012 at 8:21 AM, Liviu Nicoara nikko...@hates.ms wrote: In the light of your inability to answer the simplest questions about the correctness and usefulness of this patch, I propose we strike the patch in its entirety. Let me make something very clear to you: what I am doing here is a courtesy to the stdcxx project. There is no requirement in my job description to waste hours arguing with you in pointless squabbles over email. Nor is there a requirement in the APL V2.0 which would somehow compel us to redistribute our source code changes. We could and should re-work the instances in the library where we might use mutex and condition objects that are misaligned wrt the mentioned kernel update. Yeah, why don't you go ahead and do that. Just like you fixed stdcxx-1056. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
RE: STDCXX-1066 [was: Re: STDCXX forks]
Comments below... -Original Message- From: Stefan Teleman Sent: Monday, September 24, 2012 5:46 AM Subject: Re: STDCXX-1066 [was: Re: STDCXX forks] On Mon, Sep 24, 2012 at 8:21 AM, Liviu Nicoara nikko...@hates.ms wrote: In the light of your inability to answer the simplest questions about the correctness and usefulness of this patch, I propose we strike the patch in its entirety. Let me make something very clear to you: what I am doing here is a courtesy to the stdcxx project. Thank you for your contribution. There is a good chance that without your activity on this project over the last few months that it would be in the attic. There is no requirement in my job description to waste hours arguing with you in pointless squabbles over email. Nor is there a requirement in the APL V2.0 which would somehow compel us to redistribute our source code changes. The problem here is that code changes are expected to pass review. They must follow established conventions, solve the problem in a reasonable and portable way, provide a test case (where one doesn't already exist), as well as make sense to everyone who is working with the product. Several of the changes included don't make sense to the rest of us. Unless we manage to figure them out on our own or you manage to explain them to us, then the changes should probably be excluded from the code. For example, the changes to src/exception.cpp don't make sense to me. Here is the relevant diff... +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT) +# pragma pack(8) +# pragma align 8(__rw_aligned_buffer) +#endif + // facet must never be destroyed static __rw_aligned_buffer_Msgs msgs; It seems that we trying to give 8-byte alignment to a class of type __rw_aligned_buffer. The point of __rw_aligned_buffer is to give us a block of properly aligned data, and if it isn't aligned, then it is broken. So, why are we using a pragma for alignment here (and potentially in other places) when it seems that there is a bug in __rw_aligned_buffer that we should be addressing? Of course, if this is a problem with aligned buffer, we need a testcase. Another example are the changes to src/ctype.cpp +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT) +# pragma pack(8) +# pragma align 8(f) +# pragma align 8(pf) +#endif static union { +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT) +unsigned long long align_; +unsigned char data_ [sizeof (_STD::ctypechar)]; +#else void *align_; char data_ [sizeof (_STD::ctypechar)]; +#endif } f; static __rw_facet* const pf = new (f) _STD::ctypechar(__rw_classic_tab, false, ref); pfacet = pf; +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT) +# pragma pack(0) +#endif This change seems to give alignment to `f' in two different ways, and it seems to be applying alignment to the pointer (the issue Liviu has brought up). It seems like the simpler fix would be to unconditionally add union members to `f' so that we get the proper alignment (much like we try to do in __rw_aligned_buffer), or an even better solution would be to use __rw_aligned_buffer. We could and should re-work the instances in the library where we might use mutex and condition objects that are misaligned wrt the mentioned kernel update. Yeah, why don't you go ahead and do that. Just like you fixed stdcxx- 1056. There is no need for comments like this. Stefan, if you are feeling singled out and attacked because your patches aren't accepted without question, it might be a good idea to go back and look through the mailing list archives. We have _all_ been in this position. Nearly every time I posted a patch for the first year working on stdcxx was like this, and I've been doing C++ software for 15 years. The company where I work currently has a review process that is just as rigorous as what you're seeing with stdcxx now. The process isn't there to attack anyone personally, but to ensure the quality and maintainability of the library for years to come. Travis
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 09/16/12 12:03, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 5:47 PM, Liviu Nicoara nikko...@hates.ms wrote: I merely wanted to point out that restoring the default packing is not the same as restoring the packing to the previous value in effect. Given this, I thought about an alternative way of forcing this alignment, e.g., via a union, aligned on an appropriate type. I see an advantage here in that most of the changes would occur where we define the 'primitive' mutex and condition wrappers, saving a few pre-processor conditionals and pragmas along the way. I understood what you were saying. I just don't understand under what _sane_ circumstances a program would #include a system library header file with a previously set packing to something other than default. Or would expect a non-default packing to work during a function call to a sytem library. That's an insane configuration on any operating system that I know of, not just on SPARCV8. I can't think of a valid scenario, either. I guess the point is moot. A few more questions, if you will, as I am going through the changes: 1. I see similarities with 1040, should/would you close that one? 2. The issue only exists in MT builds, should there be a guard in configs? 3. The align reference docs talk only about aligning variables, not types. Is that different on SPARC? 4. I see rw/_mutex.h has alignment pragmas for both __rw_mutex_base class and its mutex member; same for __rw_static_mutex and its static member, etc. How does that work? 5. Why is __rw_guard aligned explicitly? I see it only contains a pointer to a mutex object. 6. The docs mention that the pragma must use the mangled variables names but I don't see that in the patch. Thanks! Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote: To be honest it's quite bizarre that you cannot share that with us. Is it really a trade secret? How can that be the case if Oracle customers are also required to perform the same alignment, perhaps using the same techniques like you showed in the patch? That's the problem. I don't know what is and what is not a trade secret, or copyrighted information, or unauthorized intellectual property disclosure anymore. I think it's in the eye of the beholder. At Sun it was very clear. Believe it or not, I had to get written approval from the Legal Counsel's Office in order to be able to share these patches. And that in spite of the fact that these patches are published, and have already been published for *years*. IANAL and I don't want to play one on TV. ;-) --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/23/12 3:48 PM, Stefan Teleman wrote: On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote: To be honest it's quite bizarre that you cannot share that with us. Is it really a trade secret? How can that be the case if Oracle customers are also required to perform the same alignment, perhaps using the same techniques like you showed in the patch? That's the problem. I don't know what is and what is not a trade secret, or copyrighted information, or unauthorized intellectual property disclosure anymore. I think it's in the eye of the beholder. At Sun it was very clear. Believe it or not, I had to get written approval from the Legal Counsel's Office in order to be able to share these patches. And that in spite of the fact that these patches are published, and have already been published for *years*. That clarifies things. Thanks. Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman stefan.tele...@gmail.com wrote: The second URL says this: QUOTE Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process. \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90) set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0) fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780) /QUOTE Here's a link to an official datatype alignment table for SPARCV8: http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html The interesting table is: Table B–2 Storage Sizes and Default Alignments in Bytes --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/23/12 5:50 PM, Stefan Teleman wrote: On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman stefan.tele...@gmail.com wrote: The second URL says this: QUOTE Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process. \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90) set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0) fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780) /QUOTE Here's a link to an official datatype alignment table for SPARCV8: http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html The interesting table is: Table B–2 Storage Sizes and Default Alignments in Bytes I see nothing really outstanding here. What is it that I should pay attention to? Thanks, Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On Sun, Sep 23, 2012 at 7:25 PM, Liviu Nicoara nikko...@hates.ms wrote: I am not asking for any other implementation and I am not looking to change anything. I wish you could explain it to us, but in the absence of trade secret details I will take an explanation for the questions above. Sorry, no. This will not be another replay of the stdcxx-1056 email discussion, which was a three week waste of my time. The patch is provided AS IS. No further testing and no further comments. I do have more important things to do. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
STDCXX-1056 DCII [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]
On 9/17/12 5:42 PM, Liviu Nicoara wrote: [...] However, after more careful thought, I think there is a problem there even though we don't have an objective proof for it, yet. The writes are not atomic and they function just like DCII, being subject to both compiler reordering and out of order execution. E.g., it is assumed that the writes to the _C_impsize and _C_impdata members occur in the program order, which would make unguarded reads/tests of _C_impsize safe. This is what the facet initialization and access code does, conceptually: if (_C_impsize) { // already initialized, use data } else { // initialize mutex.lock (); _C_impdata = get_data_from_database (); _C_impsize = get_data_size (); mutex.unlock (); } The mutex lock and unlock operations introduce memory barriers but they are not guaranteeing the order in which the two writes occur. If the writes would occur in the exact program order, unguarded reads and tests of _C_impsize would be safe. Unfortunately, a thread may see the _C_impsize variable being updated before the _C_impdata. Elaborating on the above -- facet.cpp:~250, the code is actually closer to this: if (_C_impsize) { // already initialized, use _C_impdata } else { // initialize mutex.lock (); if (_C_impsize) return _C_impdata; void* const pv = ...; size_t sz = ...; _C_impdata = pv; // 1 _C_impsize = sz; // 2 mutex.unlock (); } (1) and (2) can be reordered. Also, when mapping the STDCXX locale database, facet.cpp:288, there is another defect writing the two variables: _C_impdata = __rw_get_facet_data (cat, _C_impsize, 0, codeset); ^^^ ||| Set early in __rw_get_facet_data -+++ The only way to order the above writes, preserve the fast checking for initialization, and generally salvaging the current algorithm is a full memory barrier in between the writes, e.g.: if (_C_impsize) { // already initialized, use data } else { // ... _C_impdata = pv; full_membar (); _C_impsize = sz; // ... } Atomics are part of C++11, I wonder if anybody here has working experience with the atomic API. A mutex locking will not do because the lock is only acquiring and the write before the lock can shift down. Of course I could be wrong. Any input is welcome. Thanks, Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. I have reduced the number of reported race conditions in 22.locale.numpunct.mt from 12440: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html to 288: http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html The changes are in the following files: http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp _numpunct.h looks like this: http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h With these changes, no races conditions are repoted for any of the functions in std::numpunctT. Still, there are 288 race conditions being reported in __rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to identify the source and cause of these race conditions and correct them as well. This is not a complete solution to the problem, because we still have to re-write the chunk of code I eliminated from facet.cpp. It is only step one towards finding a real solution. But, at least for now, we have pinpointed where the source of these race conditions is located, and what causing it. The test program was run as: ./22.locale.numpunct.mt --nthreads=8 --nloops=1. More to follow. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/18/12 08:55, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. I have reduced the number of reported race conditions in 22.locale.numpunct.mt from 12440: I am attaching a test program which, while 100% MT-safe, is flagged by the Solaris thread analyzer. The test program contains, conceptually, the exact same of shared variable access like the locale management code, the facet management code and the facet data management code. This proves, if there was a need, that the tool results are to be further analyzed and interpreted but not taken at literal value. The fix we are looking for is one which corrects the initial MT failure observed in caching the facet data, as well as it preserves the unguarded reads of facet data for performance reasons. Thanks. Liviu http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html to 288: http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html The changes are in the following files: http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp _numpunct.h looks like this: http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h With these changes, no races conditions are repoted for any of the functions in std::numpunctT. Still, there are 288 race conditions being reported in __rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to identify the source and cause of these race conditions and correct them as well. This is not a complete solution to the problem, because we still have to re-write the chunk of code I eliminated from facet.cpp. It is only step one towards finding a real solution. But, at least for now, we have pinpointed where the source of these race conditions is located, and what causing it. The test program was run as: ./22.locale.numpunct.mt --nthreads=8 --nloops=1. More to follow. --Stefan #include stdio.h #include stdlib.h #include pthread.h #include unistd.h #define MAX_THREADS 128 static long counter = 0, nloops = 1000, nthreads = 16; static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; extern C { static void* f (void* pv) { for (size_t i = 0; i nloops; ++i) { if (counter == 0) { pthread_mutex_lock (lock); if (counter == 0) ++counter; pthread_mutex_unlock (lock); } else { // counter value is safe to use here } } printf (%ld\n, counter); return 0; } } int main (int argc, char** argv) { switch (argc) { case 3: nloops = atol (argv [2]); case 2: nthreads = atol (argv [1]); break; } pthread_t tid [MAX_THREADS] = { 0 }; if (nthreads MAX_THREADS) nthreads = MAX_THREADS; for (int i = 0; i nthreads; ++i) { if (pthread_create (tid + i, 0, f, 0)) exit (-1); } for (int i = 0; i nthreads; ++i) { if (tid [i]) pthread_join (tid [i], 0); } return 0; }
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote: I am attaching a test program which, while 100% MT-safe, is flagged by the Solaris thread analyzer. The program as written is not thread safe. It is reading the value of the counter variable and performing a zero comparison outside of a mutex lock: for (size_t i = 0; i nloops; ++i) { if (counter == 0) { // --- pthread_mutex_lock (lock); if (counter == 0) ++counter; pthread_mutex_unlock (lock); } else { // counter value is safe to use here } } -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/18/12 13:21, Stefan Teleman wrote: On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote: I am attaching a test program which, while 100% MT-safe, is flagged by the Solaris thread analyzer. The program as written is not thread safe. It is reading the value of the counter variable and performing a zero comparison outside of a mutex lock: Stefan, I urge you to consider the argument on its merits. Yes, the thread analyzer flags it, but it is nonetheless MT-safe. Specifically: 1. writes are properly synchronized wrt themselves 2. reads are inherently thread-safe wrt themselves 3. reads are properly synchronized wrt writes 4.no thread can possibly observe an intermediate or otherwise incomplete value. I will also add that the flag is either 0 or 1 during the execution of the program, with only one transition from 0 to 1, performed by one single thread. I will concede that I might be wrong and I am open to arguments. I would accept as a counter-argument this program if you can show a runtime failure. I would also accept as argument a scenario under which two threads would see inconsistent/incorrect values or write the variable more than once, etc. Thanks, Liviu for (size_t i = 0; i nloops; ++i) { if (counter == 0) { // --- pthread_mutex_lock (lock); if (counter == 0) ++counter; pthread_mutex_unlock (lock); } else { // counter value is safe to use here } }
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote: I will concede that I might be wrong and I am open to arguments. I would accept as a counter-argument this program if you can show a runtime failure. The the first read of the counter variable is outside a mutex lock correct? The read is followed by a 0 comparison, correct? What guarantees that between the read and the comparison the value of the counter variable hasn't been modified by another thread? The thread currently doing the comparison cannot guarantee it: it hasn't locked the mutex. Other threads may be running - actually they probably are. Another thread may have already acquired the mutex and incremented the value of counter. Your thread has no way of knowing if that has happened, because it does not yet have exclusive access to the counter variable. It will, after it acquires the mutex. Where is it reading the variable from? A register? Is it declared volatile? L2 cache? L3 cache? The program, as you wrote it, implicitly acknowledges that it is not thread safe. That is the point of the double check: one before the mutex lock, and one after it. The point of the first check has nothing to do with thread-safety, and everything to do with a minor optimization: if the value stored in variable counter is already not zero, then there's no point in locking the mutex or performing the increment. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/18/12 18:24, Stefan Teleman wrote: On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote: I will concede that I might be wrong and I am open to arguments. I would accept as a counter-argument this program if you can show a runtime failure. The the first read of the counter variable is outside a mutex lock correct? The read is followed by a 0 comparison, correct? Correct. What guarantees that between the read and the comparison the value of the counter variable hasn't been modified by another thread? The thread currently doing the comparison cannot guarantee it: it hasn't locked the mutex. Other threads may be running - actually they probably are. Another thread may have already acquired the mutex and incremented the value of counter. Your thread has no way of knowing if that has happened, because it does not yet have exclusive access to the counter variable. It will, after it acquires the mutex. It does not matter if the value is changed concurrently in between the reading of it and the actual comparison. The value can only be 0, in which case it takes the branch that locks the mutex and the next read will absolutely see a fresh value; or it can be 1 in which case it has already been initialized and this thread does not pay the penalty of a lock anymore. Where is it reading the variable from? A register? Is it declared volatile? L2 cache? L3 cache? Irrelevant. If the value has not been changed it can be read from either the cache or directly from the memory. If the value has been changed concurrently in between the reading and the actual comparison, the processor sees either a stale value of 0, or the new value 1. It is guaranteed that any thread reading the value after the (one and only) unlock will see a fresh value. The program, as you wrote it, implicitly acknowledges that it is not thread safe. That is the point of the double check: one before the mutex lock, and one after it. That is a misstatement of the program intentions. See below. The point of the first check has nothing to do with thread-safety, and everything to do with a minor optimization: if the value stored in variable counter is already not zero, then there's no point in locking the mutex or performing the increment. The first check is indeed an optimization. It is the point of this exercise to show that the unguarded reads in the localization library are not defects and the code, simplified in my test case, is thread safe in exactly the respects I mentioned before: the program only observes consistent, correct values (program states) in a concurrent environment. HTH. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/18/12 7:04 PM, Stefan Teleman wrote: On Tue, Sep 18, 2012 at 6:42 PM, Liviu Nicoara nikko...@hates.ms wrote: The first check is indeed an optimization. It is the point of this exercise to show that the unguarded reads in the localization library are not defects and the code, simplified in my test case, is thread safe in exactly the respects I mentioned before: the program only observes consistent, correct values (program states) in a concurrent environment. The Intel Thread Analyzer shows a data race in the exact same spot the SunPro Thread Analyzer shows a data race, and in the exact same spot I have stated earlier that there is a data race. The results are available here: http://s247136804.onlinehome.us/liviu-mt-test/22.mt.test.r000ti3.inspxez As usual, it is an Intel Inspector XE 2003 Analysis Results Blob. It opens with the Intel Inspector. i don't know what format it is in, or whether or not it can be read by other means. So far we have SunPro 12.3/Linux/Intel, SunPro 12.3/Solaris/Intel, SunPro 12.3/Solaris/SPARC and Intel 2003/Linux/Intel saying that there is a race condition in your program. And now I would like to get back to solving the thread-unsafety and race conditions problems in Apache stdcxx. Stefan, you cannot ignore others' arguments and embrace a 'no' attitude; the tools are not saying what you think they do. They all show the same thing because they all do the exact same analysis and we could delve into that, too. I sincerely hoped you could come up with some sort of a logical argument that would prove my latest test case wrong, or at least show it failing at runtime. My time is also limited and I have already spent more time than I wanted to trying to present a logical argument, down to the simplest test cases. I say we defer this to Martin for when he will be back from vacation, next week. For the record, because people may not have the energy to read all this back and forth, I will summarize here the gist of this thread: 1. The facet data caching is not MT-safe 2. The facet data initialization (STDCXX or system locales) is safe (*) 3. There is no unit test currently showing a failure in (2) 4. Timing results show that caching may be slower than non-caching in MT builds 5. A fix should, ideally, be binary compatible 6. A fix should, ideally, preserve performance or increase it 7. There is one patch, currently attached to the issue, by Stefan 8. Other partial patches are referenced from this thread Please correct me if I missed anything. The above summary is a good starting point for a new thread dealing with this issue. HTH. Liviu (*) When perfectly forwarding, with a theoretical concern I asked in an earlier post, directed at Martin.
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Tue, Sep 18, 2012 at 7:38 PM, Liviu Nicoara nikko...@hates.ms wrote: 1. The facet data caching is not MT-safe 2. The facet data initialization (STDCXX or system locales) is safe (*) 3. There is no unit test currently showing a failure in (2) 4. Timing results show that caching may be slower than non-caching in MT builds 5. A fix should, ideally, be binary compatible 6. A fix should, ideally, preserve performance or increase it 7. There is one patch, currently attached to the issue, by Stefan 8. Other partial patches are referenced from this thread Please correct me if I missed anything. The above summary is a good starting For the record, I fundamentally disagree with your assessment above. It is not based on any verifiable and reproducible facts, analysis of facts and/or measurements. It is based on assertions, which are, in turn, based on other assertions. As long as you continue dismissinsg the results from 4 [ FOUR ] different thread analyzer, on 4 [ FOUR ] different operating systems, and that solely on the basis on your assertions and beliefs, there is no point in continuing this debate. The results from all four thread analyzers contradict to all of your assertions. If you firmly and strongly believe that you are always right, and that the four thread analyzers are always wrong, that is your choice. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote: In the meantime I would like to stress again that __rw_get_numpunct is perfectly thread-safe and does not need extra locking for perfect forwarding. So, by removing the test for if (!(_C_flags _RW::__rw_gr)) (or any other bitmask for that matter), the functions which were thread-unsafe - and were exhibiting all the symptoms of a run-time race condition -, magically became thread-safe? I have looked *extensively* at the code in __rw_get_numpunct. It is inherently thread-unsafe. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/17/12 09:51, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote: In the meantime I would like to stress again that __rw_get_numpunct is perfectly thread-safe and does not need extra locking for perfect forwarding. So, by removing the test for if (!(_C_flags _RW::__rw_gr)) (or any other bitmask for that matter), the functions which were thread-unsafe - and were exhibiting all the symptoms of a run-time race condition -, magically became thread-safe? I have looked *extensively* at the code in __rw_get_numpunct. It is inherently thread-unsafe. I mean to say that no extra locking is necessary when the public interface forwards and no caching is done. In more detail: __rw_get_numpunct code is entered upon a call from the protected virtual interface. It calls facet _C_data member function which either returns objects constructed from previously-initialized POD data (in which case no locking is necessary), or it attempts to first initialize the facet data from the STDCXX database. If the latter, the execution goes in the facet data initialization code which is appropriately synchronized, see this stack trace: (gdb) bt #0 __rw::__rw_facet::_C_get_data (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/src/facet.cpp:179 #1 0x0043788b in __rw::__rw_facet::_C_data (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_facet.h:194 #2 0x0044874f in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:80 #3 0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578 #4 0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99 #5 0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190 #6 0x004036b8 in main (argc=2, argv=0x7fffe018) at srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29 (gdb) When the above fails, the facet data has not been initialized, e.g., when there is no STDCXX locale database, or the name of the locale does not refer to a locale in STDCXX database. The __rw_get_numpunct function will attempt next to use libc and system locales, via the __rw_setlocale class, which again is properly synchronized and the scope of that lock extends to cover the facet data initialization. See this stack trace: (gdb) bt #0 __rw::__rw_setlocale::__rw_setlocale (this=0x7fffdd30, locname=0x6e112a en_US.utf8, cat=6, nothrow=0) at srcdir/stdcxx/branches/4.2.x/src/setlocale.cpp:84 #1 0x00448974 in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:134 #2 0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578 #3 0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99 #4 0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190 #5 0x004036b8 in main (argc=2, argv=0x7fffe018) at srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29 I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. Sorry, I do not agree. Two different thread analyzers from two different compilers written by two different compiler writers tell me not to. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/17/12 11:21, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. Sorry, I do not agree. Two different thread analyzers from two different compilers written by two different compiler writers tell me not to. Stefan, that is indeed your prerogative. However, please keep in mind that tools may be buggy or may have limitations beyond what is advertised. I do not ask you to have faith in my analysis, which would be absurd, but to look for yourself, exercise due diligence in testing the code and draw your own conclusions. Thanks, Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
Liviu Nicoara nikko...@hates.ms writes: On 09/17/12 11:21, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. Sorry, I do not agree. Two different thread analyzers from two different compilers written by two different compiler writers tell me not to. Stefan, that is indeed your prerogative. However, please keep in mind that tools may be buggy or may have limitations beyond what is advertised. I do not ask you to have faith in my analysis, which would be absurd, but to look for yourself, exercise due diligence in testing the code and draw your own conclusions. so which compilers do fail? You know, some of them might use the same component. -- Wojciech
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 17, 2012 at 11:38 AM, Wojciech Meyer wojciech.me...@arm.com wrote: so which compilers do fail? You know, some of them might use the same component. Intel Compiler/Thread Analyzer on Linux, SunPro Compiler/Thread Analyzer on Linux and Solaris (Intel and SPARC). All three of them show the same exact problems. The Intel Compilers and the SunPro Compilers have nothing in common with each other. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote: Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. I got your patch, and have tested it. I have created two Experiments (that's what they are called) with the SunPro Performance Analyzer. Both experiments are targeting race conditions and deadlocks in the instrumented program, and both experiments are running the 22.locale.numpunct.mt program from the stdcxx test harness. One experiment is with your patch applied. The other experiment is with our (Solaris) patch applied. Here are the results: 1. with your patch applied: http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.nts/ 2. with our (Solaris) patch applied: http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/ --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/16/12 3:20 AM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote: Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. I got your patch, and have tested it. Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more detailed than gprof profiling output. I am going to update the incident shortly with a more detailed timing measurements on my side, in the form of a new attachment. Just FYI in case you still don't get notifications. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/16/12 11:21 AM, Liviu Nicoara wrote: On 9/16/12 3:20 AM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote: Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. I got your patch, and have tested it. Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more detailed than gprof profiling output. I am going to update the incident shortly with a more detailed timing measurements on my side, in the form of a new attachment. Just FYI in case you still don't get notifications. I have attached a new set of results to the incident, in the form of the archive: http://tinyurl.com/9drzg4e Please see the content for a description of the library changes (_numpunct.h file), the MT test program (t.cpp) and the results collected through two separate builds on two different machines (results.txt file). Thanks. Liviu
STDCXX-1066 [was: Re: STDCXX forks]
On 9/1/12 1:52 PM, Stefan Teleman wrote: On Sat, Sep 1, 2012 at 12:15 PM, I opened yesterday STDCXX-1066: https://issues.apache.org/jira/browse/STDCXX-1056 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll have patches done this weekend. Achtung: the patchset is very large and touches a very large number of files. It's strange that I didn't get an email about STDCXX-1066. Hi Stefan, I have read through the patches attached to the incident, then I briefly read about the SunPro pragma align and pack. Two questions: 1. AFAICT, the use of the packing pragma may interfere with a user's setting of the same value. I.e., a user sets the packing in their sources and then, directly or not, includes an STDCXX header. It seems to me that in such a situation, our setting of the packing value would interfere with the rest of the user's translation unit, since there is no way to `restore' the previous packing value. Something along the lines of: // user source file #pragma pack (X) // X != 8 #include iostream struct UserDef { // different alignment than X ? // ... }; Is my understanding correct? 2. The patches are against 4.2.1, but the change would be binary incompatible with the already released 4.2.1 branch. Do you plan to have this fix in 4.3.x? Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote: That is funny. What compiler are you using? What does the following test case return for you? It's the Intel compiler with the patched stdcxx for the wrong case and GCC 4.7.1 + GNU libstdc++ for the correct case. GCC + GNU libstdc++ are correct. The patched facet does not call the protected do_*() virtual functions from their public counterparts, as it is required to do by the Standard. Instead, it returns the data mebers directly (the data members were initialized in the constructor). That is the patch you proposed, which is indeed much better performing than using a mutex lock. Unfortunately, in doing so, overriding the virtual functions in a derived facet type becomes pointless. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/15/12 1:51 PM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote: That is funny. What compiler are you using? What does the following test case return for you? It's the Intel compiler with the patched stdcxx for the wrong case and GCC 4.7.1 + GNU libstdc++ for the correct case. GCC + GNU libstdc++ are correct. The patched facet does not call the protected do_*() virtual functions from their public counterparts, as it is required to do by the Standard. Instead, it returns the data mebers directly (the data members were initialized in the constructor). That is the patch you proposed, which is indeed much better performing than using a mutex lock. Unfortunately, in doing so, overriding the virtual functions in a derived facet type becomes pointless. Ahh, I see now. You are indeed right, that patch is defective. I was under the impression that we were discussing the (later) attachment stdcxx-1056-timings.tgz which contains a perfectly forwarding implementation of the facet public grouping method. The timings I attached there were the ones I thought we were discussing all along. Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. Please let me know if this clarifies things. I apologize for the misunderstanding. Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/15/12 2:57 PM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 1:02 PM, Liviu Nicoara nikko...@hates.ms wrote: I have read through the patches attached to the incident, then I briefly read about the SunPro pragma align and pack. Two questions: 1. AFAICT, the use of the packing pragma may interfere with a user's setting of the same value. I.e., a user sets the packing in their sources and then, directly or not, includes an STDCXX header. It seems to me that in such a situation, our setting of the packing value would interfere with the rest of the user's translation unit, since there is no way to `restore' the previous packing value. Something along the lines of: // user source file #pragma pack (X) // X != 8 #include iostream struct UserDef { // different alignment than X ? // ... }; Is my understanding correct? Yes, you are indeed correct, Sir. :-) However, for every #pragma pack(8) there should be a corresponding subsequent #pragma pack(0). If there isn't, that's a patch bug. #pragma pack(0) restores the default alignment. So, there should be (there *must* be) no silent packing side-effects in user programs. Yes, but it restores the default packing, not an arbitrary one, potentially set by the user prior to including our headers. Say, the user sets 2, the default is 4 and we set 8. When we set it to default it goes back to 4, instead of the expected 2. Did I get this right? 2. The patches are against 4.2.1, but the change would be binary incompatible with the already released 4.2.1 branch. Do you plan to have this fix in 4.3.x? Yes, definitely for 4.2.x and 4.3.x. I sent them for 4.2.1 just because that's what I have handy. I see now. I tried to apply them to 4.2.x and some chunks failed, and I thought I was not applying them where they were intended to go. Thanks! Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/15/12 5:19 PM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 4:57 PM, Liviu Nicoara nikko...@hates.ms wrote: Yes, but it restores the default packing, not an arbitrary one, potentially set by the user prior to including our headers. Say, the user sets 2, the default is 4 and we set 8. When we set it to default it goes back to 4, instead of the expected 2. Did I get this right? This is true, but leaving some arbirary pragma pack(N) (for N != 0) in effect for the duration of a program, and expecting it to work sounds like a very defective programming approach to me. It will certainly not work on SPARC at all. if the program needs to pack something in a certain specific way, it need to do so for that specific something only. Otherwise the side-effects of globally setting a non-default packing will destroy the program anyway. I merely wanted to point out that restoring the default packing is not the same as restoring the packing to the previous value in effect. Given this, I thought about an alternative way of forcing this alignment, e.g., via a union, aligned on an appropriate type. I see an advantage here in that most of the changes would occur where we define the 'primitive' mutex and condition wrappers, saving a few pre-processor conditionals and pragmas along the way. What do you think? Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/12/12 00:21, Stefan Teleman wrote: On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote: AFAICT, there are two cases to consider: 1. Using STDCXX locale database initializes the __rw_punct_t data in the first, properly synchronized pass through __rw_get_numpunct. All subsequent calls use the __rw_punct_t data to construct returned objects. 2. Using the C library locales does the same in the first pass, via setlocale and localeconv, but setlocale synchronization is via a per-process lock. The facet data, once initialized is used just like above. I probably missed this in the previous conversation, but did you detect a race condition in the tests if the facets are simply forwarding to the private virtual interface? I.e., did you detect that the facet initialization code is unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, it's the caching that is done incorrectly. I originally thought so too, but now I'm having doubts. :-) And I haven't tracked it down with 100% accuracy yet. I saw today this comment in src/facet.cpp, line 358: // a per-process array of facet pointers sufficiently large // to hold (pointers to) all standard facets for 8 locales static __rw_facet* std_facet_buf [__rw_facet::_C_last_type * 8]; The localization code is cleverly dense and low-level. There are a few patterns that are used throughout, though. One is the function-local static buffers, e.g., the locales buffer and the facets buffer, in locale_body.cpp and facet.cpp, respectively. They both start from a reasonable size and are grown and shrunk according to the needs, but neither locales nor facets are ever evicted from them. See: locale_body:936 -- reduction of locale buffer locale_body:1006 -- expansion of locale buffer facet.cpp:397-- reduction of facet buffer facet.cpp:462-- expansion of facet buffer Facets buffer is guarded by a static guard in: __rw_facet::_C_manage:366 The locales buffer is protected by a static guard in: __rw_locale::_C_manage:880 [...] this is all investigative stuff for tomorrow. :-) and I agree with Martin that breaking ABI in a minor release is really not an option. I'm trying to find the best way of making these facets thread-safe while inflicting the least horrible performance hit. We are getting close to the point where we can summarize all these findings and gauge an appropriate course of action. I will run your tests tomorrow and let you know. :-) That would be very helpful! Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/11/12 9:40 PM, Martin Sebor wrote: On 09/11/2012 04:15 PM, Stefan Teleman wrote: On Mon, Sep 10, 2012 at 4:24 PM, Stefan Teleman I think I have something which doesn't break BC - stay tuned because I'm testing it now. OK. So, here's a possible implementation of __rw_get_numpunct() with minimal locking, which passes the MT tests and does not break ABI: s247136804.onlinehome.us/stdcxx-1056-20120911/punct.cpp And the same for include/loc/_numpunct.h: http://s247136804.onlinehome.us/stdcxx-1056-20120911/_numpunct.h In _numpunct.h, all the functions perform no checks and no lazy initialization. They function simply as a pass-through to __rw_get_numpunct(). std::numpunctT's data members are now dead varaiables. The bad: performance is no better than with locking the mutex inside each of the std::numpunctT::*() functions, and with lazy instantiation. I wouldn't expect this to be faster than the original. In fact, I would expect it to be slower because each call to one of the public, non-virtual members results in a call to the out-of-line virtual functions (and another to __rw_get_moneypunct). Avoiding the overhead of such calls is the main and only reason why the caching exists. AFAICT, there are two cases to consider: 1. Using STDCXX locale database initializes the __rw_punct_t data in the first, properly synchronized pass through __rw_get_numpunct. All subsequent calls use the __rw_punct_t data to construct returned objects. 2. Using the C library locales does the same in the first pass, via setlocale and localeconv, but setlocale synchronization is via a per-process lock. The facet data, once initialized is used just like above. I probably missed this in the previous conversation, but did you detect a race condition in the tests if the facets are simply forwarding to the private virtual interface? I.e., did you detect that the facet initialization code is unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, it's the caching that is done incorrectly. I'm afraid unoptimized timings don't tell us much. Neither does a comparison between two compilers, even on the same OS. I looked at Liviu's timings today. I was puzzled by the difference between (1) which, IIUC, is the current implementation (presumably an optimized, thread-safe build with the same compiler and OS) and (4), which, again IIUC, is the equivalent of your latest patch here (again, presumably optimized, thread safe, same compiler/OS). I'm having trouble envisioning how calling a virtual function to retrieve the value of grouping can possibly be faster than not calling it (and simply returning the value cached in the data member of the facet. The new results I attached to the issue come from a a bit clearer tests and they focus on just two cases: the current implementation vs. a non-caching one; the latter just forwards the grouping calls to the protected do_grouping, with _no_ other changes to the implementation. The timing numbers seem to show that MT builds fare far worse with the caching than without. Stefan, if you have the time, could you please infirm :) my conclusions by timing it on one of your machines? Thanks, Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote: AFAICT, there are two cases to consider: 1. Using STDCXX locale database initializes the __rw_punct_t data in the first, properly synchronized pass through __rw_get_numpunct. All subsequent calls use the __rw_punct_t data to construct returned objects. 2. Using the C library locales does the same in the first pass, via setlocale and localeconv, but setlocale synchronization is via a per-process lock. The facet data, once initialized is used just like above. I probably missed this in the previous conversation, but did you detect a race condition in the tests if the facets are simply forwarding to the private virtual interface? I.e., did you detect that the facet initialization code is unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, it's the caching that is done incorrectly. I originally thought so too, but now I'm having doubts. :-) And I haven't tracked it down with 100% accuracy yet. I saw today this comment in src/facet.cpp, line 358: // a per-process array of facet pointers sufficiently large // to hold (pointers to) all standard facets for 8 locales static __rw_facet* std_facet_buf [__rw_facet::_C_last_type * 8]; this leads me to suspect that there is an upper limit of 8 locales + their standard facets. If the locales (and their facets) are being recycled in and out of this 8-limit cache, that would explain the other thing I also noticed (which also answers your question): yes, i have gotten the dreaded strcmp(3C) 'Assertion failed' in 22.locale.numpunct.mt when I test implemented 22.locale.numpunct.mt in a similar way to your tests. which in theory shouldn't happen, but it did. which means that there's something going on with behind-the-scenes facet re-initialization that i haven't found yet. which would partially explain your observation that MT-tests perform much worse with caching than without. this is all investigative stuff for tomorrow. :-) and I agree with Martin that breaking ABI in a minor release is really not an option. I'm trying to find the best way of making these facets thread-safe while inflicting the least horrible performance hit. i will run your tests tomorrow and let you know. :-) --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Thu, Sep 6, 2012 at 6:43 PM, Stefan Teleman stefan.tele...@gmail.com wrote: On Thu, Sep 6, 2012 at 4:02 PM, Martin Sebor mse...@gmail.com wrote: Here's a thought: it's not pretty but how about having the function initialize the facet? It already has a pointer to the base class, so it could downcast it to std::numpunct (or moneypunct, respectively), and assign the initial values to the members. Would that work? I haven't looked at them in detail (yet) but a cursory look shows that they're both recursive for the successful case. It's not going to work that way. For one, __rw_get_numpunct() and __rw_get_moneypunct() are static functions in the __rw namespace. Neither can access or modify the std::numpunctT or std::moneypunctT data members directly, because they are private. Second, both __rw_get_numpunct() and __rw_get_moneypunct() are recursive. Unless we want to start playing with PTHREAD_MUTEX_RECURSIVE, which I'm not at all sure is supported on all the platforms we support, we're not going to be able to solve the thread-safety problem here (Linux supports it as PTHREAD_MUTEX_RECURSIVE_NP, Solaris supports it PTHREAD_MUTEX_RECURSIVE). Third, both __rw_get_numpunct() and __rw_get_moneypunct() can return a NULL pointer. This is bad, because it will cause a SEGV at string assignment, during a call to either of the std::numpunctT::grouping(), std::numpunctT::truename(), etc. functions. We should fix this and throw an exception instead. The Standard doesn't say that any of these functions can throw, but it doesn't say they can't throw either. And both __rw_get_numpunct() and __rw_get_moneypunct() throw already. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/10/2012 10:56 AM, Stefan Teleman wrote: On Thu, Sep 6, 2012 at 6:43 PM, Stefan Telemanstefan.tele...@gmail.com wrote: On Thu, Sep 6, 2012 at 4:02 PM, Martin Sebormse...@gmail.com wrote: Here's a thought: it's not pretty but how about having the function initialize the facet? It already has a pointer to the base class, so it could downcast it to std::numpunct (or moneypunct, respectively), and assign the initial values to the members. Would that work? I haven't looked at them in detail (yet) but a cursory look shows that they're both recursive for the successful case. It's not going to work that way. For one, __rw_get_numpunct() and __rw_get_moneypunct() are static functions in the __rw namespace. Neither can access or modify the std::numpunctT or std::moneypunctT data members directly, because they are private. There are ways around it -- see, for example, the hack in src/access.h. This gives us, the implementation, a way to cleanly access private data without exposing them to user programs. In our case, we could solve our problem by having the facet declare some helper (defined only in punct.cpp) a friend and invoking the helper from __rw_get_numpunct. That said, I'd certainly prefer to avoid hacks as much as possible. This problem could perhaps more cleanly be solved by having the facet pass a reference to the string (or to all of its internal data) to modify to the function (or something like that). Unfortunately, it would break binary compatibility. Second, both __rw_get_numpunct() and __rw_get_moneypunct() are recursive. Unless we want to start playing with PTHREAD_MUTEX_RECURSIVE, which I'm not at all sure is supported on all the platforms we support, we're not going to be able to solve the thread-safety problem here (Linux supports it as PTHREAD_MUTEX_RECURSIVE_NP, Solaris supports it PTHREAD_MUTEX_RECURSIVE). If I remember correctly, the recursion is quite simple. The first stage (pfacet-_C_data () != 0) sets up the internal data and takes place just once per facet object. The second (recursive) stage then extracts and returns it. I only looked at it briefly last week but from what I saw I'd say it should be possible to lock only the second, non-recursive stage and do the assignment there. Third, both __rw_get_numpunct() and __rw_get_moneypunct() can return a NULL pointer. This is bad, because it will cause a SEGV at string assignment, during a call to either of the std::numpunctT::grouping(), std::numpunctT::truename(), etc. functions. We should fix this and throw an exception instead. The Standard doesn't say that any of these functions can throw, but it doesn't say they can't throw either. And both __rw_get_numpunct() and __rw_get_moneypunct() throw already. The functions only return NULL on internal error (i.e., a bug in the implementation). There's an assertion right above the return statement that will fire in a debug build. There's no point in throwing an exception to the user for our own bugs (they would have no way to recover from it). It's better to abort. We might want to do that (i.e., abort) unconditionally in this case, even when assertions are disabled. Martin --Stefan
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote: 4. Without caching of grouping values, grouping() delegates always to do_grouping(): real0m5.668s user1m11.389s sys 0m3.952s FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(), grouping(), truename(), falsename() must return their do_*() counterparts. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/10/12 15:01, Stefan Teleman wrote: On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote: 4. Without caching of grouping values, grouping() delegates always to do_grouping(): real0m5.668s user1m11.389s sys 0m3.952s FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(), grouping(), truename(), falsename() must return their do_*() counterparts. Yes, of course. :-) I just meant to say that is _always_ delegating to do_grouping(), i.e., does no caching. Liviu --Stefan -- And now I see with eye serene The very pulse of the machine.
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 10, 2012 at 1:32 PM, Martin Sebor mse...@gmail.com wrote: That said, I'd certainly prefer to avoid hacks as much as possible. This problem could perhaps more cleanly be solved by having the facet pass a reference to the string (or to all of its internal data) to modify to the function (or something like that). Unfortunately, it would break binary compatibility. I think I have something which doesn't break BC - stay tuned because I'm testing it now. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/06/12 19:54, Martin Sebor wrote: I'm not sure how easily we can do that. Almost all of locale is initialized lazily. Some of the layers might depend on the facets being initialized lazily as well. This was a deliberate design choice. One of the constraints was to avoid dynamic initialization or allocation at startup. [...] There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. Out of curiosity, does it still work if you move the locale into the thread function? Like this: I created the test case because I wanted something more specific and with more predictable results. Initially, the locale object was created in the thread and it was passing. Also, does the 27.objects test pass with this patch? Will try. I don't think that we can actually use this patch, I bundled it with the test so that people can test the approach right away. I don't know if we have tests for it but writing to cerr/cout in a bad_alloc handler should succeed. I.e., this should not abort: try { struct rlimit rlim = { }; getrlimit (RLIMIT_DATA, rlim); rlim.rlim_cur = 0; setrlimit (RLIMIT_DATA, rlim); for ( ; ; ) new int; } catch (bad_alloc) { cout caught bad alloc: 123 '\n'; } I see.. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/06/12 22:58, Stefan Teleman wrote: On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote: There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. With your patches, the performance is much much better: I didn't think about timing it, thanks! The result is somewhat expected without the synchronization. However, the full construction-time initialization is inappropriate if you take in consideration the delegation from the public API to the protected virtual API of the facet class, and the possibility of overrides in subclasses. Also, lazy initialization follows the principle of economy of effort, and caching increases efficiency. If possible in a robust way, I would rather have those two. :-) Liviu # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 6 2012, 20:50:13 # COMMENT: thread safety # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 1035.05 user 1191.76 sys 63.49 --Stefan -- And now I see with eye serene The very pulse of the machine.
dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]
On 09/07/12 11:58, Stefan Teleman wrote: On Fri, Sep 7, 2012 at 8:52 AM, Liviu Nicoara nikko...@hates.ms wrote: On 09/06/12 19:54, Martin Sebor wrote: Also, does the 27.objects test pass with this patch? No, it does not. It hangs at the first insertion, line 227. Unfortunately, I cannot debug it because dbx does not work properly in my installation. What are the symptoms with dbx mishbehaving? (maybe I can help). Thanks! I get this when launching the debugger: $ dbx -xexec32 t For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc Reading t Reading ld-linux.so.2 dbx: fetch at 0xf400 failed -- Input/output error dbx: warning: could not put in breakpoint dbx: warning: internal handler (-80) made defunct -- could not enable event BPT dbx: warning: internal handler (-86) made defunct -- could not enable event BPT ... Then the program just runs from the get go. If I set breakpoints and rerun, it ignores them. Also, long running programs cannot be broken into with C-c: ^Cdbx: warning: Interrupt ignored but forwarded to child. signal INT in (unknown) at 0xf77cf6c4 0xf77cf6c4: movl %edx,%eax dbx: warning: 'stop' ignored -- while doing rtld handshake terminating signal 2 SIGINT dbx: warning: 'stop' ignored -- while doing rtld handshake (dbx) where dbx: program is not active (dbx) Liviu
Re: dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]
On Fri, Sep 7, 2012 at 12:23 PM, Liviu Nicoara nikko...@hates.ms wrote: I get this when launching the debugger: $ dbx -xexec32 t For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc Reading t Reading ld-linux.so.2 dbx: fetch at 0xf400 failed -- Input/output error dbx: warning: could not put in breakpoint There is fix for this, but for Studio 12.2: http://wesunsolve.net/bugid/id/6545393 It had priority 5 (very low) so that leads me to assume that it hasn't been fixed yet in 12.3. But I will ask at work about 12.3/Linux. Strangely enough, I don't get the error on Fedora 17 with 12.3. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/12 23:51, Martin Sebor wrote: On 09/05/2012 09:03 PM, Stefan Teleman wrote: [...] Agreed. But: if the choice is between an implementation which [1] breaks ABI and [2] performs 20% worse -- even in contrived test cases -- than another implementation [2] which doesn't break ABI, and performs better than the first one, why would we even consider the first one? Breaking the ABI is not an option (unless we rev the version). But I'm not sure I understand what you think breaks the ABI. I think Stefan is referring to adding a mutex member variable to the facet in question and breaking binary compatibility. If that is the case I have confused things when I suggested exactly that, earlier. A cursory read through the __rw_facet source shows that inherits from __rw_synchronized in MT builds, therefore each facet carries its own mutex member. Liviu We don't need to add a new mutex -- we can use the __rw_facet member for the locking. Or did you mean something else? Martin --Stefan -- And now I see with eye serene The very pulse of the machine.
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Thu, Sep 6, 2012 at 9:16 AM, Liviu Nicoara nikko...@hates.ms wrote: I think Stefan is referring to adding a mutex member variable to the facet in question and breaking binary compatibility. If that is the case I have confused things when I suggested exactly that, earlier. A cursory read through the __rw_facet source shows that inherits from __rw_synchronized in MT builds, therefore each facet carries its own mutex member. On 09/05/12 23:51, Martin Sebor wrote: We don't need to add a new mutex -- we can use the __rw_facet member for the locking. Or did you mean something else? A possible implementation using the __rw_facet mutex could look like this: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (__self-_C_mutex); if (!(_C_flags _RW::__rw_gr)) { // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } } return _C_grouping; } Except that it will not work. Because the __rw_facet mutex member is being locked in file ../src/facet.cpp in function __rw_facet::_C_manage at line 366: // acquire lock _RWSTD_MT_STATIC_GUARD (_RW::__rw_facet); This will deadlock because this is the mutex already locked by std::numpunctT::grouping(). I've already tested this with 3 compilers, and, it does indeed deadlock. So yes, I did indeed mean something different. I meant adding another mutex data member to the numpunct class. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/06/2012 09:58 AM, Stefan Teleman wrote: On Thu, Sep 6, 2012 at 9:16 AM, Liviu Nicoaranikko...@hates.ms wrote: I think Stefan is referring to adding a mutex member variable to the facet in question and breaking binary compatibility. If that is the case I have confused things when I suggested exactly that, earlier. A cursory read through the __rw_facet source shows that inherits from __rw_synchronized in MT builds, therefore each facet carries its own mutex member. On 09/05/12 23:51, Martin Sebor wrote: We don't need to add a new mutex -- we can use the __rw_facet member for the locking. Or did you mean something else? A possible implementation using the __rw_facet mutex could look like this: templateclass _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (__self-_C_mutex); if (!(_C_flags _RW::__rw_gr)) { // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } } return _C_grouping; } Except that it will not work. Because the __rw_facet mutex member is being locked in file ../src/facet.cpp in function __rw_facet::_C_manage at line 366: // acquire lock _RWSTD_MT_STATIC_GUARD (_RW::__rw_facet); This locks a different mutex, one unrelated to __rw_facet::_C_mutex. Look at the implementation of _RWSTD_MT_STATIC_GUARD() in rw/_defs.h: # define _RWSTD_MT_STATIC_GUARD(type) \ typedef _RW::__rw_typetype,__LINE__ _UniqueType; \ _RWSTD_MT_CLASS_GUARD (_UniqueType) # define _RWSTD_MT_CLASS_GUARD(type) \ _RWSTD_MT_GUARD (_RW::__rw_get_static_mutex ((type*)0)) and then at __rw_get_static_mutex in rw/_mutex.h. It gets a static (global) mutex from a mutex factory via template instantiation: __rw_static_mutex__rw_facet::_C_mutex; This will deadlock because this is the mutex already locked by std::numpunctT::grouping(). I've already tested this with 3 compilers, and, it does indeed deadlock. Something else must be locking the mutex then. I quickly looked at __rw_get_punct (and __rw_get_numpunct) in punct.cpp but I couldn't find any signs of the mutex being locked there. The only thing I see being locked there is the global C locale. If it's not the callee it has to be the caller. So yes, I did indeed mean something different. I meant adding another mutex data member to the numpunct class. We can't (and shouldn't need to) add one. We need to be able to make do with the existing member. That's what it's there for. Martin --Stefan
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Thu, Sep 6, 2012 at 2:46 PM, Stefan Teleman stefan.tele...@gmail.com wrote: [steleman@darthvader][/src/steleman/programming/stdcxx-ss122/stdcxx-4.2.1/build/tests][09/06/2012 14:40:11][1084] ./22.locale.numpunct.mt --nthreads=2 --nloops=100 # INFO (S1) (10 lines): # TEXT: # COMPILER: SunPro, __SUNPRO_CC = 0x5120 # ENVIRONMENT: i386 running linux (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 6 2012, 14:38:42 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct # NOTE (S2) (5 lines): # TEXT: executing /usr/bin/locale -a /tmp/tmpfile-YzXcb9 # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 grouping: _RWSTD_MT_GUARD (__self-_C_mutex): 0xf774913c _C_get_data: _RWSTD_MT_GUARD (_C_mutex): 0xf774913c [ ... deadlock ... ] And just to make absolutely sure that this isn't a case of SunPro being insane, here's the output from the Intel compiler: [steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/06/2012 15:44:23][1390] ./22.locale.numpunct.mt --nthreads=2 --nloops=100 # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 6 2012, 15:43:29 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct # NOTE (S2) (5 lines): # TEXT: executing locale -a /tmp/tmpfile-FoZR0J # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 grouping: _RWSTD_MT_GUARD (__self-_C_mutex): 0xf74c2424 _C_get_data: _RWSTD_MT_GUARD (_C_mutex): 0xf74c2424 [ ... deadlock ... ] --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Sep 5, 2012, at 4:03 PM, Martin Sebor wrote: On 09/05/2012 01:33 PM, Liviu Nicoara wrote: On 09/05/12 15:17, Martin Sebor wrote: On 09/05/2012 12:40 PM, Liviu Nicoara wrote: On 09/05/12 14:09, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: [...] OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: [...] I am afraid this would be unsafe, too (if I said otherwise earlier I was wrong). [...] Thoughts? You're right, there's still a problem. We didn't get the double checked locking quite right. I wish for simplicity: eliminate the lazy initialization, and fully initialize the facet in the constructor. Then we'd need no locking on copying its data (std::string takes care of its own copying). I'm not sure how easily we can do that. Almost all of locale is initialized lazily. Some of the layers might depend on the facets being initialized lazily as well. This was a deliberate design choice. One of the constraints was to avoid dynamic initialization or allocation at startup. [...] There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. Will continue to look into it. Although STDCXX does not have an implementation of the atomics library we could probably use the existing, unexposed, atomics API to implement a more robust lazy initialization. Liviu $ cat tests/localization/t.cpp; svn diff #include locale #include cstdio #include cstdlib #include unistd.h #define MAX_THREADS16 #define MAX_LOOPS1000 static bool hold = true; extern C { static void* f (void* pv) { std::numpunctchar const fac = *reinterpret_cast std::numpunctchar* (pv); while (hold) ; for (int i = 0; i != MAX_LOOPS; ++i) { std::string const grp = fac.grouping (); } return 0; } } int main (int argc, char** argv) { std::locale const loc = std::locale (argv [1]); std::numpunctchar const fac = std::use_facetstd::numpunctchar (loc); pthread_t tid [MAX_THREADS] = { 0 }; for (int i = 0; i MAX_THREADS; ++i) { if (pthread_create (tid + i, 0, f, (void*)fac)) exit (-1); } sleep (1); hold = false; for (int i = 0; i MAX_THREADS; ++i) { if (tid [i]) pthread_join (tid [i], 0); } return 0; } Index: include/loc/_numpunct.h === --- include/loc/_numpunct.h (revision 1381575) +++ include/loc/_numpunct.h (working copy) @@ -61,24 +61,40 @@ struct numpunct: _RW::__rw_facet string_type; _EXPLICIT numpunct (_RWSTD_SIZE_T __ref = 0) -: _RW::__rw_facet (__ref), _C_flags (0) { } +: _RW::__rw_facet (__ref) { +_C_grouping = do_grouping (); +_C_truename = do_truename (); +_C_falsename = do_falsename (); +_C_decimal_point = do_decimal_point (); +_C_thousands_sep = do_thousands_sep (); +} virtual ~numpunct () _RWSTD_ATTRIBUTE_NOTHROW; // 22.2.3.1.1, p1 -char_type decimal_point () const; +char_type decimal_point () const { +return _C_decimal_point; +} // 22.2.3.1.1, p2 -char_type thousands_sep () const; +char_type thousands_sep () const { +return _C_thousands_sep; +} // 22.2.3.1.1, p3 -string grouping () const; +string grouping () const { +return _C_grouping; +} // 22.2.3.1.1, p4 -string_type truename () const; +string_type truename () const { +return _C_truename; +} // 22.2.3.1.1, p4 -string_type falsename () const; +string_type falsename () const { +return _C_falsename; +} static _RW::__rw_facet_id id; @@ -112,15 +128,13 @@ protected: private: -int _C_flags; // bitmap of cached data valid flags -string _C_grouping;// cached results of virtual members +string _C_grouping; string_type _C_truename; string_type _C_falsename; char_type _C_decimal_point; char_type _C_thousands_sep; }; - #ifndef _RWSTD_NO_SPECIALIZED_FACET_ID _RWSTD_SPECIALIZED_CLASS @@ -134,95 +148,6 @@ _RW::__rw_facet_id numpunctwchar_t::id # endif // _RWSTD_NO_WCHAR_T #endif // _RWSTD_NO_SPECIALIZED_FACET_ID - -template class _CharT -inline _TYPENAME numpunct_CharT::char_type -numpunct_CharT::decimal_point () const -{ -if (!(_C_flags _RW::__rw_dp)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the decimal point first (may throw) -// then set a flag to avoid future
Re: STDCXX-1056 [was: Re: STDCXX forks]
I'm not sure how easily we can do that. Almost all of locale is initialized lazily. Some of the layers might depend on the facets being initialized lazily as well. This was a deliberate design choice. One of the constraints was to avoid dynamic initialization or allocation at startup. [...] There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. Out of curiosity, does it still work if you move the locale into the thread function? Like this: static void* f (void* pv) { std::numpunctchar const fac = *reinterpret_cast std::numpunctchar* (pv); while (hold) ; for (int i = 0; i != MAX_LOOPS; ++i) { const std::locale loc (en_US.utf8); typedef std::numpunctchar NumPunct; NumPunct const fac = std::use_facetNumPunct(loc); std::string const grp = fac.grouping (); } return 0; } Also, does the 27.objects test pass with this patch? I don't know if we have tests for it but writing to cerr/cout in a bad_alloc handler should succeed. I.e., this should not abort: try { struct rlimit rlim = { }; getrlimit (RLIMIT_DATA, rlim); rlim.rlim_cur = 0; setrlimit (RLIMIT_DATA, rlim); for ( ; ; ) new int; } catch (bad_alloc) { cout caught bad alloc: 123 '\n'; } Martin
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote: There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. With your patches, the performance is much much better: # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 6 2012, 20:50:13 # COMMENT: thread safety # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 1035.05 user 1191.76 sys 63.49 --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/12 00:51, Stefan Teleman wrote: On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (_C_mutex); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } return _C_grouping; } That's what I wanted to do originally - use a per-object mutex. FWIW, it is pretty obvious to me _now_ that these assignments are not MT-safe, by themselves and also in the context of the test guarding them. You're right, access must be synchronized on a per-facet object basis. Since the data that needs to be protected on assignment is instance data, the mutex used in the guard must be a (yet to be added) instance mutex. That would make it binary incompatible and would go in ... 4.3.x? This works: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_STATIC_GUARD (_Type); It seems to me that a static guard is excessive. The only thing that needs guarding is the actual assignment to facet instance members -- a mutex instance member would suffice. And even so, this is still not thread-safe: Two different threads [ T1 and T2 ], seeking two different locales [en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping() at the same time - because they are running on two different cores. They both test for if (!(_C_flags _RW::__rw_gr)) and then -- assuming the expression above evaluates to true -- one of them wins the mutex [T1], and the other one [T2] blocks on the mutex. When T1 is done and exits the function, the grouping is set to en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running from T1 who now thinks he got en_US.UTF-8, but instead he gets ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it (remember, the std::string grouping _charT buffer is shared by the caller from T1 and the caller from T2). I am pretty sure that in your example, they would operate on two different instances of the facet class. The static guard would synchronize their running alright but through two different facet instances, copying data from two different places of the locale database. In this respect a static guard is excessive. I would defer it to Martin to validate a course of action but to me it looks like the only problem is the concurrent assignment to facet instance member variables inside the facet member functions. Which could be easily and nicely solved with a plain guard on a mutex ordinary member variable. Thanks! Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
That's what I wanted to do originally - use a per-object mutext. Unfortunately the _C_mutex member in rw::__rw_synchronized is static: That's the wrong __rw_synchronized -- this one is for non-MT builds. The one we get in MT builds in on line 370. struct __rw_synchronized { // static so that it takes up no space static _RWSTD_EXPORT __rw_mutex _C_mutex; ... and __rw::rw_guard doesn't have an appropriate constructor. Intel C++ complains about it too: /src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/include/loc/_numpunct.h(181): error: no instance of constructor __rw::__rw_guard::__rw_guard matches the argument list argument types are: (const __rw::__rw_mutex) _RWSTD_MT_GUARD (_C_mutex); ^ That's because the function is const and the guard takes a non-const reference. This might be a defect in __rw_synchronized: the mutex member should probably be mutable. ... And even so, this is still not thread-safe: Two different threads [ T1 and T2 ], seeking two different locales [en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping() at the same time - because they are running on two different cores. Those threads won't have the same numpunct facet. A single facet is always associated with exactly one locale and the relationship can never change. The locale class guarantees this. They both test for if (!(_C_flags _RW::__rw_gr)) and then -- assuming the expression above evaluates to true -- one of them wins the mutex [T1], and the other one [T2] blocks on the mutex. When T1 is done and exits the function, the grouping is set to en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running from T1 who now thinks he got en_US.UTF-8, but instead he gets ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it (remember, the std::string grouping _charT buffer is shared by the caller from T1 and the caller from T2). So at a minimum, the locking must happen before evaluating the if (!(_C_flags _RW::__rw_gr)) expression. You're right that there is a race here. But the race is benign because the T2 will assign the same value to the string as T1 did (because the grouping must be the same in the same locale). This doesn't reallocate the string but simply overwrites each byte with its own value. On all architectures we support this is atomic and safe. Martin This still doesn't solve what ends up being returned in grouping. If we lock at the top of the function, then, when T2 acquires the mutex, the test expression will evaluate to false. Therefore T2 will return whatever is in grouping right now, which happens to be en_US.UTF-8 as set by T1, when T2 really wanted ja_JP.UTF-8. I really think the appropriate fix here -- which would address the performance implications -- is more complex than this. I am thinking about creating and using a (non-publicly accessible) internal locale cache: typedef std::mapstd::string, std::locale locale_cache; where all the locales are stored fully initialized, on demand. There is only one locale instantiation and initialization overhead cost per locale. After a locale has been instantiated and placed into the cache, the caller of any specfic locale gets a copy from the cache, fully instantiated and initialized. But this breaks ABI, so I'm thinking it's for stdcxx 5. Thoughts? --Stefan
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: You're right that there is a race here. But the race is benign because the T2 will assign the same value to the string as T1 did (because the grouping must be the same in the same locale). This doesn't reallocate the string but simply overwrites each byte with its own value. On all architectures we support this is atomic and safe. OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: template class _CharT class numpunct : public _RW::__rw_facet { public: // [ ... snip, no changes here ... ] private: int _C_flags; // bitmap of cached data valid flags string _C_grouping;// cached results of virtual members string_type _C_truename; string_type _C_falsename; char_type _C_decimal_point; char_type _C_thousands_sep; mutable _RW::__rw_mutex _C_object_mutex; // - }; And then: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { _RWSTD_MT_GUARD (_C_object_mutex); // double-test here to avoid re-writing an already written string if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } } return _C_grouping; } same changes for std::numpuncttruename() and std::numpunct::falsename(). Ran 22.locale.numpunct.mt with the default values for nthreads and nloops. Yesterday's test results, with the static per-class mutex: # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 4 2012, 09:11:36 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct # NOTE (S2) (5 lines): # TEXT: executing locale -a /tmp/tmpfile-V7siTq # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 # INFO (S1) (3 lines): # TEXT: testing std::numpunctcharT with 8 threads, 20 iterations each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596 ar_EG.utf8 ar_IN ar_IN.utf8 } # CLAUSE: lib.locale.numpunct [ ... ] # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 2139.31 user 2406.09 sys 155.61 === Today's test results with the per-object mutex (as shown above): # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 5 2012, 13:08:03 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct # NOTE (S2) (5 lines): # TEXT: executing locale -a /tmp/tmpfile-ww0nez # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 # INFO (S1) (3 lines): # TEXT: testing std::numpunctcharT with 8 threads, 20 iterations each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596 ar_EG.utf8 ar_IN ar_IN.utf8 } # CLAUSE: lib.locale.numpunct [ ... ] # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/12 14:09, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: [...] OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: [...] And then: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { _RWSTD_MT_GUARD (_C_object_mutex); // double-test here to avoid re-writing an already written string if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } } return _C_grouping; } I am afraid this would be unsafe, too (if I said otherwise earlier I was wrong). The compiler might re-arrange the protected assignments, such that another thread sees a partially updated object, where the flags are updated and the string not. I don't think we're going to get away with this here without either a simpler and more inefficient top-level locking, or doing away completely with the lazy initialization. Thoughts? Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/2012 12:40 PM, Liviu Nicoara wrote: On 09/05/12 14:09, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: [...] OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: [...] And then: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { _RWSTD_MT_GUARD (_C_object_mutex); // double-test here to avoid re-writing an already written string if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags |= _RW::__rw_gr; } } return _C_grouping; } I am afraid this would be unsafe, too (if I said otherwise earlier I was wrong). The compiler might re-arrange the protected assignments, such that another thread sees a partially updated object, where the flags are updated and the string not. I don't think we're going to get away with this here without either a simpler and more inefficient top-level locking, or doing away completely with the lazy initialization. Thoughts? You're right, there's still a problem. We didn't get the double checked locking quite right. We need to prevent the reordering both by the compiler and by the hardware so that the reader doesn't get an out of date value of _C_grouping. Making the members volatile and/or moving the body of the first if into an out-of-line function should help with the compiler reordering but we'll need a barrier to prevent the hardware from serving us up stale data (i.e., letting T2 see the updated _C_flags but a _C_grouping that's still being modified). Calling pthread_mutex_unlock does insert a barrier, but the barrier is only executed when both tests pass, and not when the first one fails. We need the barrier in both cases. This seems like it would get too ugly (and big) to put in an inline function. Martin Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/2012 01:33 PM, Liviu Nicoara wrote: On 09/05/12 15:17, Martin Sebor wrote: On 09/05/2012 12:40 PM, Liviu Nicoara wrote: On 09/05/12 14:09, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: [...] OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: [...] I am afraid this would be unsafe, too (if I said otherwise earlier I was wrong). [...] Thoughts? You're right, there's still a problem. We didn't get the double checked locking quite right. I wish for simplicity: eliminate the lazy initialization, and fully initialize the facet in the constructor. Then we'd need no locking on copying its data (std::string takes care of its own copying). I'm not sure how easily we can do that. Almost all of locale is initialized lazily. Some of the layers might depend on the facets being initialized lazily as well. This was a deliberate design choice. One of the constraints was to avoid dynamic initialization or allocation at startup. Another was to be able to use iostreams (to a limited extent) in low memory conditions to write an error message to stderr. It's been a long time and the details are more than a little fuzzy. I'll need to spend some time going through the code and refreshing my memory. Martin Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
2012/9/5 Stefan Teleman stefan.tele...@gmail.com: On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoara nikko...@hates.ms wrote: That is good, right? Yes, it is good. :-) Could not get an Intel build off the ground on the account of the LIMITS.cpp test not running to completion because of a possible compiler bug. I posted earlier an account of it. Do you have a support account that allows posting bug reports for Intel's C++ compiler? Oh, yes I remember that now, it happened to me too. I worked around it by copying the LIMITS executable from a gcc build. I don't have a support contract with Intel - I'm using the free compilers at home. I'm guessing Intel must have an open forum for Intel developers using their compilers, so that might be a way to report the bug. But I haven't looked for it. I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle have any program that would allow one to test drive their compiler on a shared box somewhere? I vaguely remember that HP had something like that a while ago. I know Sun had such a program. Oracle I don't really know but my educated guess is not a chance. ;-) Gold and above members of the oracle parter network can* get access to a lab running various Oracle HW and SW. For regular persons Stefan's answer applies 100%. However there's nothing stopping you from downloading Solaris 11 iso or a virtualbox image from http://www.oracle.com/technetwork/server-storage/solaris11/downloads/index.html installing it in a virtualization solution of your choice (virtualbox works best for me) and installing Studio from http://www.oracle.com/technetwork/server-storage/solarisstudio/downloads/index.html As long as the installation is used for sw development and not for production use you do not need to pay Oracle a dime, you just won't get any updates/patches. (check the licenses fo details, I'm no lawyer) P. * this was announced in April, I have no idea about the current status One way of testing with SunPro is to download the free version of SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux as well). It works very well on OpenSUSE and Ubuntu (tested and used on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't work at all on Fedora 17 because of the TLS glibc bug (however it used to on earlier Fedora releases). And now for the latest about 1056: I managed (after many, MANY tries) to get a run of 22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and without the thread-safety patches applied, and I hit the jackpot: # INFO (S1) (10 lines): # TEXT: # COMPILER: SunPro, __SUNPRO_CC = 0x5100 # ENVIRONMENT: sparc-v8 running sunos-5.11 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 4 2012, 16:10:03 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct # NOTE (S2) (5 lines): # TEXT: executing /usr/bin/locale -a /var/tmp/tmpfile-MWa4bQ # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 # INFO (S1) (3 lines): # TEXT: testing std::numpunctcharT with 4 threads, 1000 iterations each, in 32 locales { C POSIX af_ZA.UTF-8 ar_AE.UTF-8 ar_BH.UTF-8 ar_DZ.UTF-8 ar_EG.ISO8859-6 ar_EG.UTF-8 ar_IQ.UTF-8 ar_JO.UTF-8 ar_KW.UTF-8 ar_LY.UTF-8 ar_MA.UTF-8 ar_OM.UTF-8 ar_QA.UTF-8 ar_SA.UTF-8 ar_TN.UTF-8 ar_YE.UTF-8 as_IN.UTF-8 az_AZ.UTF-8 be_BY.UTF-8 bg_BG.ISO8859-5 bg_BG.UTF-8 bn_IN.UTF-8 bs_BA.ISO8859-2 bs_BA.UTF-8 ca_ES.ISO8859-1 ca_ES.ISO8859-15 ca_ES.UTF-8 cs_CZ.ISO8859-2 cs_CZ.UTF-8 cs_CZ.UTF-8@euro } # CLAUSE: lib.locale.numpunct [ ... snip ... ] # INFO (S1) (4 lines): # TEXT: creating a thread pool with 4 threads # CLAUSE: lib.locale.numpunct # LINE: 548 Abort (core dump) /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/tests/localization/22.locale.numpunct.mt.cpp:140: Assertion '0 == rw_strncmp (grp.c_str (), data.grouping_.c_str ())' failed. /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/lib/libstdcxx4.so.4.2.1'__1cE__rwQ__rw_assert_fail6Fpkc2i2_v_+0x78 [0xff249af8] /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/tests/22.locale.numpunct.mt'thread_func+0x500 [0x1b050] /lib/libc.so.1'_lwp_start+0x0 [0xf7b553f0] Bingo! At line 140 in 22.locale.numpunct.mt.cpp, we see: RW_ASSERT (0 == rw_strncmp (grp.c_str (), data.grouping_.c_str ())); What this means: Between the time std::string grp was initialized at line 133: const std::string grp = np.grouping (); and the time we hit line 140 (the RW_ASSERT), another thread modified the std::string _C_grouping member of class template class _CharT class numpunct : public _RW::__rw_facet. And because this is the non-patched version of _numpunct.h, the variable std::string grp did not really return a deep copy of _C_grouping, but a shared
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Wed, Sep 5, 2012 at 4:20 PM, Stefan Teleman stefan.tele...@gmail.com wrote: But then there's another aspect -- which I probably failed to highlight in my previous email: the per-object mutex implementation is 20% *slower* than the class-static mutex implementation. class-static implementation: real 2139.31 user 2406.09 sys 155.61 pe-object implementation: real 2416.75 user 2694.64 sys 159.49 The above results are for the Intel compiler. Now here are the results with GCC 4.7.0: 1. with the static per-class mutex: # INFO (S1) (10 lines): # TEXT: # COMPILER: gcc 4.7.0, __VERSION__ = 4.7.0 20120507 (Red Hat 4.7.0-5) # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 5 2012, 06:21:18 # COMMENT: thread safety [ ... ] # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 2165.06 user 2428.08 sys 151.30 2. With the per-object mutex: # INFO (S1) (10 lines): # TEXT: # COMPILER: gcc 4.7.0, __VERSION__ = 4.7.0 20120507 (Red Hat 4.7.0-5) # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 5 2012, 21:29:56 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct [ ... ] # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 2438.70 user 2726.44 sys 155.79 About the same percentage difference as the Intel compiler. --Stefan --- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote: I suspect the difference is due to the overhead of the repeated initialization and destruction of the per-object mutex in the test. The test repeatedly creates (and discards) named locale objects. The per-class mutex is initialized just once in the process, no matter how many facet objects (how many distinct named locales) the test creates. But the per-object mutex must be created (and destroyed) for each named locale. Agreed. But: if the choice is between an implementation which [1] breaks ABI and [2] performs 20% worse -- even in contrived test cases -- than another implementation [2] which doesn't break ABI, and performs better than the first one, why would we even consider the first one? --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/2012 09:03 PM, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote: I suspect the difference is due to the overhead of the repeated initialization and destruction of the per-object mutex in the test. The test repeatedly creates (and discards) named locale objects. The per-class mutex is initialized just once in the process, no matter how many facet objects (how many distinct named locales) the test creates. But the per-object mutex must be created (and destroyed) for each named locale. Agreed. But: if the choice is between an implementation which [1] breaks ABI and [2] performs 20% worse -- even in contrived test cases -- than another implementation [2] which doesn't break ABI, and performs better than the first one, why would we even consider the first one? Breaking the ABI is not an option (unless we rev the version). But I'm not sure I understand what you think breaks the ABI. We don't need to add a new mutex -- we can use the __rw_facet member for the locking. Or did you mean something else? Martin --Stefan
Re: STDCXX-1056 [was: Re: STDCXX forks]
Incidentally, a benchmark that I would expect to be more representative of real programs than the test we've been using is one that creates a small number of named locales up front and then uses each in a loop in each thread (as opposed to creating and destroying it in each iteration and thread). With this setup, the per-object mutex implementation should be much faster than the one with the per-class mutex because of the absence of contention. The test has an option that exercises this setup: --shared-locale Martin On 09/05/2012 09:51 PM, Martin Sebor wrote: On 09/05/2012 09:03 PM, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote: I suspect the difference is due to the overhead of the repeated initialization and destruction of the per-object mutex in the test. The test repeatedly creates (and discards) named locale objects. The per-class mutex is initialized just once in the process, no matter how many facet objects (how many distinct named locales) the test creates. But the per-object mutex must be created (and destroyed) for each named locale. Agreed. But: if the choice is between an implementation which [1] breaks ABI and [2] performs 20% worse -- even in contrived test cases -- than another implementation [2] which doesn't break ABI, and performs better than the first one, why would we even consider the first one? Breaking the ABI is not an option (unless we rev the version). But I'm not sure I understand what you think breaks the ABI. We don't need to add a new mutex -- we can use the __rw_facet member for the locking. Or did you mean something else? Martin --Stefan
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Sep 4, 2012, at 10:34 AM, Stefan Teleman wrote: On Tue, Sep 4, 2012 at 7:35 AM, Liviu Nicoara nikko...@hates.ms wrote: By no means I am dismissing it. It is at the very least an issue of efficiency. I will try an Intel C++ build on x86_64 at some point today. What build type was that? I also notice that you have 4.2.1 in your path? Are you building out of 4.2.1. tag? I built off 4.2.x branch which also has support for custom timeouts (--soft-timeout) in rw_test. Yes, this is 4.2.1 with all the Solaris patches which can be applied on Linux. The environment is: # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 4 2012, 09:11:36 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct [ ... snip ... ] With all the thread-safety Solaris patches for 1056 applied. The test runs to completion: # NOTE (S2) (5 lines): # TEXT: executing locale -a /tmp/tmpfile-Cej8JU # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 # INFO (S1) (3 lines): # TEXT: testing std::numpunctcharT with 8 threads, 20 iterations each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596 ar_EG.utf8 ar_IN ar_IN.utf8 } # CLAUSE: lib.locale.numpunct [ ... snip ... ] # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 2139.31 user 2406.09 sys 155.61 [steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012 9:50:14][1300] These tests running to completion (with the patches applied) are 100% reproducible on every single run, on Linux and Solaris (Intel and SPARC). That is good, right? The timings are the output of /usr/bin/time -p. Yes, ~40 minutes wall clock run time for one test is quite a lot. But it runs to completion, with zero errors. And when taking into consideration that there are 8 threads with 20 iterations, maybe it's not that much. FWIW, SELinux is also fully enabled in my environment. Ok, so that clears (this build at least), maybe inefficient but runs to completion with no crashes. The timeouts can also be a symptom of race conditions/deadlocks or attempting to access invalid thread stacks, or partially written data. I would look at timeouts from the standpoint of the progression of the run: if the run does not progress at all, as in threads are deadlocked, then I would consider that a defect. If the run progresses, it is a mere inefficiency, however bad. It's very platform specific. On Solaris SPARC (either 32-bit or 64-bit) I never get timeouts, I always get either SEGV or SIGBUS. On Intel it appears that timeouts are the prevailing symptom. Could not get an Intel build off the ground on the account of the LIMITS.cpp test not running to completion because of a possible compiler bug. I posted earlier an account of it. Do you have a support account that allows posting bug reports for Intel's C++ compiler? I am not sure what an explicit run timeout would add, except hiding the hang/deadlock/memory corruption problem behind a timeout. The timeout option allowed me to run the program with a larger value, such that the alarm would not trigger and allow the program to run to completion. Before enlarging the timeout value, the individual tests would time out. I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle have any program that would allow one to test drive their compiler on a shared box somewhere? I vaguely remember that HP had something like that a while ago. Thanks! Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
I recall discussing this when you opened the defect but I'm not sure what the outcome of the discussion was. I looked at the code some more just now and I agree with you that (at least the numpunct) facet isn't thread safe. The premise was that we didn't need any locking because the facet never changes after it's initialized, and the same data member can safely be assigned multiple times. The problem is that the string data members must be initialized before they can be assigned, and assignment to the same member must be guarded against concurrent accesses, there seems to be no mechanism that guarantees that this will be true when the same facet is used from within multiple threads. For example, the function below tries to avoid re-initialization of _C_grouping but the |= expression can be reordered either by the compiler or by the hardware to complete before the assignment to the member. The function also fails to protect the assignment to the member from taking place simultaneously in multiple threads. template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } return _C_grouping; } We need a fix but I don't think the one in the patch attached to the issue is a good solution. Locking all objects of the template specialization is far too coarse. Even having a lock per object would kill iostream performance. Creating a deep copy on return from each of the functions would also slow things down noticeably. The efficient solution must avoid locking in the common case (i.e., after the facet has been initialized). It can avoid locking around the POD data members altogether since those can safely be assigned multiple times (on common hardware), but it needs to ensure that the string data members are initialized before they are assigned to and are assigned at most once simultaneously. If we introduce a lock, it must be per object and not per type. With that, I would expect the following function to fix the problem in the one above. Let me know what you think. template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (_C_mutex); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } return _C_grouping; } Martin On 09/04/2012 07:40 PM, Stefan Teleman wrote: On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoaranikko...@hates.ms wrote: That is good, right? Yes, it is good. :-) Could not get an Intel build off the ground on the account of the LIMITS.cpp test not running to completion because of a possible compiler bug. I posted earlier an account of it. Do you have a support account that allows posting bug reports for Intel's C++ compiler? Oh, yes I remember that now, it happened to me too. I worked around it by copying the LIMITS executable from a gcc build. I don't have a support contract with Intel - I'm using the free compilers at home. I'm guessing Intel must have an open forum for Intel developers using their compilers, so that might be a way to report the bug. But I haven't looked for it. I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle have any program that would allow one to test drive their compiler on a shared box somewhere? I vaguely remember that HP had something like that a while ago. I know Sun had such a program. Oracle I don't really know but my educated guess is not a chance. ;-) One way of testing with SunPro is to download the free version of SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux as well). It works very well on OpenSUSE and Ubuntu (tested and used on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't work at all on Fedora 17 because of the TLS glibc bug (however it used to on earlier Fedora releases). And now for the latest about 1056: I managed (after many, MANY tries) to get a run of 22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and without the thread-safety patches applied, and I hit the jackpot: # INFO (S1) (10 lines): # TEXT: # COMPILER: SunPro, __SUNPRO_CC = 0x5100 # ENVIRONMENT: sparc-v8 running sunos-5.11 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 4 2012, 16:10:03 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct # NOTE (S2) (5 lines): # TEXT: executing /usr/bin/locale -a /var/tmp/tmpfile-MWa4bQ # CLAUSE:
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (_C_mutex); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } return _C_grouping; } That's what I wanted to do originally - use a per-object mutext. Unfortunately the _C_mutex member in rw::__rw_synchronized is static: struct __rw_synchronized { // static so that it takes up no space static _RWSTD_EXPORT __rw_mutex _C_mutex; void _C_lock () { } void _C_unlock () { } __rw_guard _C_guard () { return __rw_guard (_C_mutex); } and __rw::rw_guard doesn't have an appropriate constructor. Intel C++ complains about it too: /src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/include/loc/_numpunct.h(181): error: no instance of constructor __rw::__rw_guard::__rw_guard matches the argument list argument types are: (const __rw::__rw_mutex) _RWSTD_MT_GUARD (_C_mutex); ^ This works: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_STATIC_GUARD (_Type); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } return _C_grouping; } Although I'm not sure of the performance implications difference between _RWSTD_MT_STATIC_GUARD and _RWSTD_MT_CLASS_GUARD for this particular problem. I'm going with nothing in real life. :-) And even so, this is still not thread-safe: Two different threads [ T1 and T2 ], seeking two different locales [en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping() at the same time - because they are running on two different cores. They both test for if (!(_C_flags _RW::__rw_gr)) and then -- assuming the expression above evaluates to true -- one of them wins the mutex [T1], and the other one [T2] blocks on the mutex. When T1 is done and exits the function, the grouping is set to en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running from T1 who now thinks he got en_US.UTF-8, but instead he gets ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it (remember, the std::string grouping _charT buffer is shared by the caller from T1 and the caller from T2). So at a minimum, the locking must happen before evaluating the if (!(_C_flags _RW::__rw_gr)) expression. This still doesn't solve what ends up being returned in grouping. If we lock at the top of the function, then, when T2 acquires the mutex, the test expression will evaluate to false. Therefore T2 will return whatever is in grouping right now, which happens to be en_US.UTF-8 as set by T1, when T2 really wanted ja_JP.UTF-8. I really think the appropriate fix here -- which would address the performance implications -- is more complex than this. I am thinking about creating and using a (non-publicly accessible) internal locale cache: typedef std::mapstd::string, std::locale locale_cache; where all the locales are stored fully initialized, on demand. There is only one locale instantiation and initialization overhead cost per locale. After a locale has been instantiated and placed into the cache, the caller of any specfic locale gets a copy from the cache, fully instantiated and initialized. But this breaks ABI, so I'm thinking it's for stdcxx 5. Thoughts? --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Mon, Sep 3, 2012 at 11:57 PM, Stefan Teleman stefan.tele...@gmail.com wrote: On Mon, Sep 3, 2012 at 3:19 PM, Liviu Nicoara nikko...@hates.ms wrote: I tried, unsuccessfully, to reproduce the failure observed by Martin in 22.locale.moneypunct.mt, in both debug and optimized, wide and narrow builds on a 16x machine: I can reproduce it consistently on Solaris without the patches. It has a very high failure rate. It's also been reported at least once before, here: http://old.nabble.com/22.locale.numpunct.mt-run-hangs-td20133013.html I remember there were more hits about 22.locale.numpunct.mt in Google when I searched for it a while ago, but I didn't look as closely now. I'll create a Solaris build without my patches tomorrow and send the output. FWIW, my today's builds without the patches with the Intel compiler: COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 22.locale.numpunct.mt ran for 3 hours (wall clock time) without ever completing or doing anything except flatlining the cpu at 115%, on both 32-bit and 64-bit. This is on: [steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012 0:21:13][1181] uname -a Linux darthvader.stefanteleman.org 3.5.0-2.fc17.x86_64 #1 SMP Mon Jul 30 14:48:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I can't (yet) rebuild with GCC 4.7.0 because Fedora 17's build of GCC C++ is a mess (libsupc++.a requires TLS but glibc was built with TLS disabled). --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX forks
On 08/31/12 12:20, Stefan Teleman wrote: On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote: Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. Not quite. :-) [...] The official Oracle port for Solaris 10 and 11, which I maintain, is here: http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/ Hi Stefan, I have went through the patches. Specifically, I have spent more time looking in the mutex alignment changes and the C++ C library headers patches, and I only read the others. In order: The test extensions seem to be genuine by and large, but I would further analyze them after I find out what is it they are addressing (test cases?). The regression tests whose names contain references to internal bug numbers require a bit more analysis as to their usefulness. Of course an explanation attached to each would alleviate duplicating your work. I have not cross-checked them to JIRA. Some of the compiler characterizations changes, as well as the associated GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile, GNUmakefile.cfg changes. I may have spotted other issues but I would wait for your feed-back first. The C++ C library headers seem to have been re-written to your port. I am unsure why you needed this, but it surely breaks the original intent for these headers' structure. I have also noticed that you stripped the Apache notice and added an Oracle copyright notice on them. This pretty much sums up my first impression. If you were willing to submit them one by one (*), complete with test cases and complete patches (ChangeLog entries and all) I would put in the necessary review time and provide feed-back asap. Please let me know if I missed anything. I sincerely hope this helps. Thanks! Liviu (*) As in issue by issue, not file by file.
Re: STDCXX forks
On 08/31/12 08:58, C. Bergström wrote: On 08/31/12 07:40 PM, Liviu Nicoara wrote: Please correct me if I am wrong. I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and the other to Stefan Teleman. The first one contains a number of genuine code changes outside the Apache repository, but without canonical and meaningful commit comments, and without accompanying test cases. We can help clean this up if it makes a real difference Some carry what seem to be bug id's from a private bug database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727 iirc we renamed the files because of some problem with cmake - it was trying to compile those files instead of treating them like headers. , fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. I forget why we did this tbh - I'll try to get an answer in case it comes up again as a question Are you guys dropping support for any platforms? Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We may add OSX/Win to that list soon, but I can't promise at this point. For targets we only are able to test x86 and x86_64 Hi Christopher, I finished looking though the github changes. If you are interested in contributing some of those changes back, as complete patches [1], I would volunteer my time to promptly review them and work with you to get them in, as needed. In the meantime, if you will, could you please elaborate on the two changes mentioned above? The first one changes the traditional file extension on the template implementation files, the second affects the library building on platforms like I mentioned in another post. AFAIK STDCXX does not implement or follow Boost conventions and/or library file structure. HTH. Thanks! Liviu [1] http://stdcxx.apache.org/bugs.html#patch_format
Re: STDCXX forks
On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote: Hi Stefan, I have went through the patches. Specifically, I have spent more time looking in the mutex alignment changes and the C++ C library headers patches, and I only read the others. In order: The test extensions seem to be genuine by and large, but I would further analyze them after I find out what is it they are addressing (test cases?). The regression tests whose names contain references to internal bug numbers require a bit more analysis as to their usefulness. Of course an explanation attached to each would alleviate duplicating your work. I have not cross-checked them to JIRA. Some of the compiler characterizations changes, as well as the associated GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile, GNUmakefile.cfg changes. I may have spotted other issues but I would wait for your feed-back first. The C++ C library headers seem to have been re-written to your port. I am unsure why you needed this, but it surely breaks the original intent for these headers' structure. I have also noticed that you stripped the Apache notice and added an Oracle copyright notice on them. This pretty much sums up my first impression. Hi Yes, you are correct. :-) To begin with: the compiler flags/GNUmakefile changes are very specific to the SunPro compilers and to our internal build system. These changes are most likely not suitable for inclusion in the canonical stdcxx, except maybe for the sunpro.config changes, in case someone would like to be able to replicate our builds. I'd like to mention that, in Solaris, Apache stdcxx is a system library. About the Standard C Library forwarding header files: these changes are specfic to Solaris. The reason behind them is: the Solaris architectural rules, which can be best summarized as: there can be only one of each. In other words, it is Verboten, in Solaris, to duplicate the Standard C Library header files (or any other header file for that matter). The Solaris Standard C Library header files are C++-clean - they are required to be so, by the same architectural rules. Again, these changes are specific to Solaris, and are probably not portable across other implementations. I know for a fact that they are not portable for either the GCC or Intel compilers (with which I test regularly on Linux, in addition to SunPro). So these two groups of changesets can be ignored. I opened yesterday STDCXX-1066: https://issues.apache.org/jira/browse/STDCXX-1056 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll have patches done this weekend. Achtung: the patchset is very large and touches a very large number of files. It's strange that I didn't get an email about STDCXX-1066. I'd also like to talk about STDCXX-1056: https://issues.apache.org/jira/browse/STDCXX-1056 which has already had an initial discussion, and for which I have attached a patch. This issue also addresses (indirectly) linkage when building with GCC. On the recent versions of GCC that I have tested with, passing -supc++ on link line automatically links with the GNU libstdc++6.so (on top of linking with stdcxx), and that just bad. And then I'll have to cross reference the patches which refer to our internal bug numbers because most of them are quite old and right now, off the top of my head, I can't remember what they are. :-) --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX forks
On Sep 1, 2012, at 1:52 PM, Stefan Teleman wrote: On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote: [...] This pretty much sums up my first impression. [...] To begin with: the compiler flags/GNUmakefile changes are very specific to the SunPro compilers and to our internal build system. These changes are most likely not suitable for inclusion in the canonical stdcxx, except maybe for the sunpro.config changes, in case someone would like to be able to replicate our builds. I'd like to mention that, in Solaris, Apache stdcxx is a system library. Good point. About the Standard C Library forwarding header files: these changes are specfic to Solaris. The reason behind them is: the Solaris architectural rules, which can be best summarized as: there can be only one of each. In other words, it is Verboten, in Solaris, to duplicate the Standard C Library header files (or any other header file for that matter). The Solaris Standard C Library header files are C++-clean - they are required to be so, by the same architectural rules. Again, these changes are specific to Solaris, and are probably not portable across other implementations. I know for a fact that they are not portable for either the GCC or Intel compilers (with which I test regularly on Linux, in addition to SunPro). So these two groups of changesets can be ignored. Got it. I opened yesterday STDCXX-1066: https://issues.apache.org/jira/browse/STDCXX-1056 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll have patches done this weekend. Achtung: the patchset is very large and touches a very large number of files. It's strange that I didn't get an email about STDCXX-1066. Acknowledged. I have seen the individual patches on those two websites. I'd also like to talk about STDCXX-1056: https://issues.apache.org/jira/browse/STDCXX-1056 which has already had an initial discussion, and for which I have attached a patch. This issue also addresses (indirectly) linkage when building with GCC. On the recent versions of GCC that I have tested with, passing -supc++ on link line automatically links with the GNU libstdc++6.so (on top of linking with stdcxx), and that just bad. I briefly looked at it, will delve into it later. And then I'll have to cross reference the patches which refer to our internal bug numbers because most of them are quite old and right now, off the top of my head, I can't remember what they are. :-) Thanks! Liviu
Re: STDCXX forks
On 08/31/12 07:40 PM, Liviu Nicoara wrote: Please correct me if I am wrong. I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and the other to Stefan Teleman. The first one contains a number of genuine code changes outside the Apache repository, but without canonical and meaningful commit comments, and without accompanying test cases. We can help clean this up if it makes a real difference Some carry what seem to be bug id's from a private bug database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727 iirc we renamed the files because of some problem with cmake - it was trying to compile those files instead of treating them like headers. , fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. I forget why we did this tbh - I'll try to get an answer in case it comes up again as a question Are you guys dropping support for any platforms? Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We may add OSX/Win to that list soon, but I can't promise at this point. For targets we only are able to test x86 and x86_64 NetBSD also has a fork I believe, but not sure if it's public/status Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. He has quite a number of patches and forget where those are kept. I'm guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS testsuite. http://www.peren.com/pages/cppvs_set.htm
Re: STDCXX forks
On 08/31/12 08:58, C. Bergström wrote: On 08/31/12 07:40 PM, Liviu Nicoara wrote: Please correct me if I am wrong. I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and the other to Stefan Teleman. The first one contains a number of genuine code changes outside the Apache repository, but without canonical and meaningful commit comments, and without accompanying test cases. We can help clean this up if it makes a real difference That is appreciated. It would make a huge difference to anybody coming anew to the project or just catching up, like me. Some carry what seem to be bug id's from a private bug database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727 iirc we renamed the files because of some problem with cmake - it was trying to compile those files instead of treating them like headers. , fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. I forget why we did this tbh - I'll try to get an answer in case it comes up again as a question My point exactly, a good comment explains the why's. Are you guys dropping support for any platforms? Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We may add OSX/Win to that list soon, but I can't promise at this point. For targets we only are able to test x86 and x86_64 The reason I am asking is that changes to the template implementation files have most likely broken the support for compilers that do implicit inclusion of template implementations (a.k.a. link-time instantiation of templates). I am not incriminating anyone mind you, I am just trying to point out there mightbe a gap in your testing of your changes. NetBSD also has a fork I believe, but not sure if it's public/status Do you know where that is? Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. He has quite a number of patches and forget where those are kept. I'm guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS testsuite. http://www.peren.com/pages/cppvs_set.htm I briefly looked over some patches, there's quite a few of them. I plan to go in more detail over the week-end. Cheers. L
Re: STDCXX forks
On 08/31/12 08:10 PM, Liviu Nicoara wrote: NetBSD also has a fork I believe, but not sure if it's public/status Do you know where that is? He just posted his bulk patch http://www.netbsd.org/~joerg/stdcxx.diff There's a small amount of CVS noise and we already have one part on our github. I don't have more information
Re: STDCXX forks
On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote: Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. Not quite. :-) You are - most likely referring to the svn repo at CVSDude here: http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/ Yes I maintain that fork. All the patches are in this directory: http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/diffs/ and they apply with the apply_patches.sh script from here: http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/apply_patches.sh The official Oracle port for Solaris 10 and 11, which I maintain, is here: http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/ This is the source code which is used to build stdcxx on Solaris 10 and 11. The patches for stdcxx on Solaris are here: http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/patches/ The official Solaris 11 stdcxx package can be installed on Solaris 11 from here: http://pkg.oracle.com/solaris/release/en/search.shtml?token=stdcxxaction=Search For Solaris 10, you need to install SUNWlibstdcxx4 - which contains the stdcxx library and its header files - and SUNWlibstdcxx4S which contains the source code plus all the patches, and which installs in /usr/share/src/. It installs on Solaris 10 Update 10 and later. I maintain the repo at cvsdude on an ad-hoc basis. That means now and then. :-) The source code repo at Oracle is constantly maintained at Oracle, and we publish source code drops every two weeks there (I think). --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX forks
On Aug 31, 2012, at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote: Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. FWIW, The ASF supports git so if people think it would help, all we'd need to do is ask #infra to move stdcxx to git.
Re: STDCXX forks
On Fri, Aug 31, 2012 at 8:58 AM, C. Bergström cbergst...@pathscale.com wrote: He has quite a number of patches and forget where those are kept. I'm guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS testsuite. http://www.peren.com/pages/cppvs_set.htm Correction: my patches aren't related to Qt/KDE at all at this point - they are related to Solaris. Apache stdcxx is part of Solaris Core. So the authoritative patchset for Solaris/stdcxx is the one publisehd by Oracle, since it is kept up-to-date and it represents an officially released and supported version of stdcxx. --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com