Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-24 Thread Stefan Teleman
On Mon, Sep 24, 2012 at 8:21 AM, Liviu Nicoara nikko...@hates.ms wrote:

 In the light of your inability to answer the simplest questions about the
 correctness and usefulness of this patch, I propose we strike the patch in
 its entirety.

Let me make something very clear to you: what I am doing here is a
courtesy to the stdcxx project. There is no requirement in my job
description to waste hours arguing with you in pointless squabbles
over email. Nor is there a requirement in the APL V2.0 which would
somehow compel us to redistribute our source code changes.

  We could and should re-work the instances in the library where
 we might use mutex and condition objects that are misaligned wrt the
 mentioned kernel update.

Yeah, why don't you go ahead and do that. Just like you fixed stdcxx-1056.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


RE: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-24 Thread Travis Vitek
Comments below...

 -Original Message-
 From: Stefan Teleman
 Sent: Monday, September 24, 2012 5:46 AM
 Subject: Re: STDCXX-1066 [was: Re: STDCXX forks]
 
 On Mon, Sep 24, 2012 at 8:21 AM, Liviu Nicoara nikko...@hates.ms
 wrote:
 
  In the light of your inability to answer the simplest questions about
  the correctness and usefulness of this patch, I propose we strike the
  patch in its entirety.
 
 Let me make something very clear to you: what I am doing here is a
 courtesy to the stdcxx project.

Thank you for your contribution. There is a good chance that without your 
activity on this project over the last few months that it would be in the attic.

 There is no requirement in my job
 description to waste hours arguing with you in pointless squabbles
 over email. Nor is there a requirement in the APL V2.0 which would
 somehow compel us to redistribute our source code changes.

The problem here is that code changes are expected to pass review. They must 
follow established conventions, solve the problem in a reasonable and portable 
way, provide a test case (where one doesn't already exist), as well as make 
sense to everyone who is working with the product.

Several of the changes included don't make sense to the rest of us. Unless we 
manage to figure them out on our own or you manage to explain them to us, then 
the changes should probably be excluded from the code.

For example, the changes to src/exception.cpp don't make sense to me. Here is 
the relevant diff...

  +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT)
  +#  pragma pack(8)
  +#  pragma align 8(__rw_aligned_buffer)
  +#endif
  +
   // facet must never be destroyed
   static __rw_aligned_buffer_Msgs msgs;

It seems that we trying to give 8-byte alignment to a class of type 
__rw_aligned_buffer. The point of __rw_aligned_buffer is to give us a block of 
properly aligned data, and if it isn't aligned, then it is broken. So, why are 
we using a pragma for alignment here (and potentially in other places) when it 
seems that there is a bug in __rw_aligned_buffer that we should be addressing? 
Of course, if this is a problem with aligned buffer, we need a testcase.

Another example are the changes to src/ctype.cpp

  +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT)
  +#  pragma pack(8)
  +#  pragma align 8(f)
  +#  pragma align 8(pf)
  +#endif
   static union {
  +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT)
  +unsigned long long align_;
  +unsigned char data_ [sizeof (_STD::ctypechar)];
  +#else
   void *align_;
   char  data_ [sizeof (_STD::ctypechar)];
  +#endif
   } f;
   static __rw_facet* const pf =
   new (f) _STD::ctypechar(__rw_classic_tab, false, ref);
   pfacet = pf;
  +#if defined(_RWSTD_STRICT_SPARCV8_MUTEX_ALIGNMENT)
  +#  pragma pack(0)
  +#endif

This change seems to give alignment to `f' in two different ways, and it seems 
to be applying alignment to the pointer (the issue Liviu has brought up). It 
seems like the simpler fix would be to unconditionally add union members to `f' 
so that we get the proper alignment (much like we try to do in 
__rw_aligned_buffer), or an even better solution would be to use 
__rw_aligned_buffer.

 
  We could and should re-work the instances in the library where
  we might use mutex and condition objects that are misaligned wrt the
  mentioned kernel update.
 
 Yeah, why don't you go ahead and do that. Just like you fixed stdcxx-
 1056.
 

There is no need for comments like this.

Stefan, if you are feeling singled out and attacked because your patches aren't 
accepted without question, it might be a good idea to go back and look through 
the mailing list archives. We have _all_ been in this position. Nearly every 
time I posted a patch for the first year working on stdcxx was like this, and 
I've been doing C++ software for 15 years.

The company where I work currently has a review process that is just as 
rigorous as what you're seeing with stdcxx now. The process isn't there to 
attack anyone personally, but to ensure the quality and maintainability of the 
library for years to come.

Travis




Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Liviu Nicoara

On 09/16/12 12:03, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 5:47 PM, Liviu Nicoara nikko...@hates.ms wrote:


I merely wanted to point out that restoring the default packing is not the
same as restoring the packing to the previous value in effect.

Given this, I thought about an alternative way of forcing this alignment,
e.g., via a union, aligned on an appropriate type. I see an advantage here
in that most of the changes would occur where we define the 'primitive'
mutex and condition wrappers, saving a few pre-processor conditionals and
pragmas along the way.


I understood what you were saying. I just don't understand under what
_sane_ circumstances a program would #include a system library header
file with a previously set packing to something other than default. Or
would expect a non-default packing to work during a function call to a
sytem library. That's an insane configuration on any operating system
that I know of, not just on SPARCV8.


I can't think of a valid scenario, either. I guess the point is moot.

A few more questions, if you will, as I am going through the changes:

1. I see similarities with 1040, should/would you close that one?
2. The issue only exists in MT builds, should there be a guard in configs?
3. The align reference docs talk only about aligning variables, not types. Is 
that different on SPARC?
4. I see rw/_mutex.h has alignment pragmas for both __rw_mutex_base class and 
its mutex member; same for __rw_static_mutex and its static member, etc. How 
does that work?
5. Why is __rw_guard aligned explicitly? I see it only contains a pointer to a 
mutex object.
6. The docs mention that the pragma must use the mangled variables names but I 
don't see that in the patch.

Thanks!

Liviu





Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Stefan Teleman
On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote:

 To be honest it's quite bizarre that you cannot share that with us. Is it
 really a trade secret? How can that be the case if Oracle customers are also
 required to perform the same alignment, perhaps using the same techniques
 like you showed in the patch?

That's the problem. I don't know what is and what is not a trade
secret, or copyrighted information, or unauthorized intellectual
property disclosure anymore.  I think it's in the eye of the
beholder. At Sun it was very clear.

Believe it or not, I had to get written approval from the Legal
Counsel's Office in order to be able to share these patches. And that
in spite of the fact that these patches are published, and have
already been published for *years*.

IANAL and I don't want to play one on TV. ;-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Liviu Nicoara

On 9/23/12 3:48 PM, Stefan Teleman wrote:

On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote:


To be honest it's quite bizarre that you cannot share that with us. Is it
really a trade secret? How can that be the case if Oracle customers are also
required to perform the same alignment, perhaps using the same techniques
like you showed in the patch?


That's the problem. I don't know what is and what is not a trade
secret, or copyrighted information, or unauthorized intellectual
property disclosure anymore.  I think it's in the eye of the
beholder. At Sun it was very clear.

Believe it or not, I had to get written approval from the Legal
Counsel's Office in order to be able to share these patches. And that
in spite of the fact that these patches are published, and have
already been published for *years*.


That clarifies things. Thanks.

Liviu



Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Stefan Teleman
On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman
stefan.tele...@gmail.com wrote:

 The second URL says this:

 QUOTE
 Due to a change in the implementation of the userland mutexes
 introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and
 pthread_mutex_t must start at 8-byte aligned addresses. If this
 requirement is not satisfied, all non-compliant applications on
 Solaris/SPARC may fail with the signal SEGV with a callstack similar
 to the following one or with similar callstacks containing the
 function mutex_trylock_process.

   \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90)
   set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0)
   fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)

 /QUOTE

Here's a link to an official datatype alignment table for SPARCV8:

http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html

The interesting table is:

Table B–2 Storage Sizes and Default Alignments in Bytes

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Liviu Nicoara

On 9/23/12 5:50 PM, Stefan Teleman wrote:

On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman
stefan.tele...@gmail.com wrote:


The second URL says this:

QUOTE
Due to a change in the implementation of the userland mutexes
introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and
pthread_mutex_t must start at 8-byte aligned addresses. If this
requirement is not satisfied, all non-compliant applications on
Solaris/SPARC may fail with the signal SEGV with a callstack similar
to the following one or with similar callstacks containing the
function mutex_trylock_process.

   \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90)
   set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0)
   fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)

/QUOTE


Here's a link to an official datatype alignment table for SPARCV8:

http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html

The interesting table is:

Table B–2 Storage Sizes and Default Alignments in Bytes


I see nothing really outstanding here. What is it that I should pay attention 
to?

Thanks,
Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Stefan Teleman
On Sun, Sep 23, 2012 at 7:25 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I am not asking for any other implementation and I am not looking to change
 anything. I wish you could explain it to us, but in the absence of trade
 secret details I will take an explanation for the questions above.

Sorry, no.

This will not be another replay of the stdcxx-1056 email discussion,
which was a three week waste of my time.

The patch is provided AS IS. No further testing and no further
comments. I do have more important things to do.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


STDCXX-1056 DCII [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]

2012-09-22 Thread Liviu Nicoara

On 9/17/12 5:42 PM, Liviu Nicoara wrote:

[...]
However, after more careful thought, I think there is a problem there even
though we don't have an objective proof for it, yet.

The writes are not atomic and they function just like DCII, being subject to
both compiler reordering and out of order execution. E.g., it is assumed that
the writes to the _C_impsize and _C_impdata members occur in the program order,
which would make unguarded reads/tests of _C_impsize safe. This is what the
facet initialization and access code does, conceptually:

if (_C_impsize) {
 // already initialized, use data
}
else {
 // initialize
 mutex.lock ();
 _C_impdata = get_data_from_database ();
 _C_impsize = get_data_size ();
 mutex.unlock ();
}

The mutex lock and unlock operations introduce memory barriers but they are not
guaranteeing the order in which the two writes occur. If the writes would occur
in the exact program order, unguarded reads and tests of _C_impsize would be
safe. Unfortunately, a thread may see the _C_impsize variable being updated
before the _C_impdata.


Elaborating on the above -- facet.cpp:~250, the code is actually closer to this:

if (_C_impsize) {
 // already initialized, use _C_impdata
}
else {
 // initialize
 mutex.lock ();

 if (_C_impsize)
 return _C_impdata;

 void* const pv = ...;
 size_t sz = ...;

 _C_impdata = pv;   // 1
 _C_impsize = sz;   // 2

 mutex.unlock ();
}

(1) and (2) can be reordered. Also, when mapping the STDCXX locale database, 
facet.cpp:288, there is another defect writing the two variables:


   _C_impdata = __rw_get_facet_data (cat, _C_impsize, 0, codeset);
  ^^^
  |||
Set early in __rw_get_facet_data -+++

The only way to order the above writes, preserve the fast checking for 
initialization, and generally salvaging the current algorithm is a full memory 
barrier in between the writes, e.g.:


if (_C_impsize) {
 // already initialized, use data
}
else {
 // ...

 _C_impdata = pv;
 full_membar ();
 _C_impsize = sz;

 // ...
}

Atomics are part of C++11, I wonder if anybody here has working experience with 
the atomic API. A mutex locking will not do because the lock is only acquiring 
and the write before the lock can shift down.


Of course I could be wrong. Any input is welcome.

Thanks,
Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman
On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:
 I hope you agree that this synchronization is sufficient for the facet
 initialization and reading of facet data.

I have reduced the number of reported race conditions in
22.locale.numpunct.mt from 12440:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

to 288:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html

The changes are in the following files:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp

_numpunct.h looks like this:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h

With these changes, no races conditions are repoted for any of the
functions in std::numpunctT.

Still, there are 288 race conditions being reported in
__rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to
identify the source and cause of these race conditions and correct
them as well.

This is not a complete solution to the problem, because we still have
to re-write the chunk of code I eliminated from facet.cpp. It is only
step one towards finding a real solution. But, at least for now, we
have pinpointed where the source of these race conditions is located,
and what causing it.

The test program was run as: ./22.locale.numpunct.mt --nthreads=8
--nloops=1.

More to follow.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 09/18/12 08:55, Stefan Teleman wrote:

On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:

I hope you agree that this synchronization is sufficient for the facet
initialization and reading of facet data.


I have reduced the number of reported race conditions in
22.locale.numpunct.mt from 12440:



I am attaching a test program which, while 100% MT-safe, is flagged by the 
Solaris thread analyzer. The test program contains, conceptually, the exact 
same of shared variable access like the locale management code, the facet 
management code and the facet data management code. This proves, if there was a 
need, that the tool results are to be further analyzed and interpreted but not 
taken at literal value.

The fix we are looking for is one which corrects the initial MT failure 
observed in caching the facet data, as well as it preserves the unguarded reads 
of facet data for performance reasons.

Thanks.

Liviu



http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

to 288:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html

The changes are in the following files:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp

_numpunct.h looks like this:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h

With these changes, no races conditions are repoted for any of the
functions in std::numpunctT.

Still, there are 288 race conditions being reported in
__rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to
identify the source and cause of these race conditions and correct
them as well.

This is not a complete solution to the problem, because we still have
to re-write the chunk of code I eliminated from facet.cpp. It is only
step one towards finding a real solution. But, at least for now, we
have pinpointed where the source of these race conditions is located,
and what causing it.

The test program was run as: ./22.locale.numpunct.mt --nthreads=8
--nloops=1.

More to follow.

--Stefan

#include stdio.h
#include stdlib.h
#include pthread.h
#include unistd.h

#define MAX_THREADS 128

static long counter = 0, nloops = 1000, nthreads = 16;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

extern C {

static void* 
f (void* pv)
{
for (size_t i = 0; i  nloops; ++i) {
if (counter == 0) {
pthread_mutex_lock (lock);
if (counter == 0)
++counter;
pthread_mutex_unlock (lock);
}
else {
// counter value is safe to use here
}
}

printf (%ld\n, counter);

return 0;
}

}

int
main (int argc, char** argv)
{
switch (argc) {
case 3:
nloops = atol (argv [2]);
case 2:
nthreads = atol (argv [1]);
break;
}

pthread_t tid [MAX_THREADS] = { 0 };

if (nthreads  MAX_THREADS)
nthreads = MAX_THREADS;

for (int i = 0; i  nthreads; ++i) {
if (pthread_create (tid + i, 0, f, 0))
exit (-1);
}

for (int i = 0; i  nthreads; ++i) {
if (tid [i])
pthread_join (tid [i], 0);
}

return 0;
}


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman
On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I am attaching a test program which, while 100% MT-safe, is flagged by
 the Solaris thread analyzer.

The program as written is not thread safe. It is reading the value of
the counter variable and performing a zero comparison outside of a
mutex lock:

for (size_t i = 0; i  nloops; ++i) {
if (counter == 0) {  // --- 
pthread_mutex_lock (lock);
if (counter == 0)
++counter;
pthread_mutex_unlock (lock);
}
else {
// counter value is safe to use here
}
}




-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 09/18/12 13:21, Stefan Teleman wrote:

On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote:


I am attaching a test program which, while 100% MT-safe, is flagged by
the Solaris thread analyzer.


The program as written is not thread safe. It is reading the value of
the counter variable and performing a zero comparison outside of a
mutex lock:


Stefan, I urge you to consider the argument on its merits. Yes, the thread 
analyzer flags it, but it is nonetheless MT-safe. Specifically:

1. writes are properly synchronized wrt themselves
2. reads are inherently thread-safe wrt themselves
3. reads are properly synchronized wrt writes
4.no thread can possibly observe an intermediate or otherwise incomplete value.

I will also add that the flag is either 0 or 1 during the execution of the 
program, with only one transition from 0 to 1, performed by one single thread.

I will concede that I might be wrong and I am open to arguments. I would accept 
as a counter-argument this program if you can show a runtime failure. I would 
also accept as argument a scenario under which two threads would see 
inconsistent/incorrect values or write the variable more than once, etc.

Thanks,
Liviu



for (size_t i = 0; i  nloops; ++i) {
 if (counter == 0) {  // --- 
 pthread_mutex_lock (lock);
 if (counter == 0)
 ++counter;
 pthread_mutex_unlock (lock);
 }
 else {
 // counter value is safe to use here
 }
 }



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman
On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I will concede that I might be wrong and I am open to arguments. I would
 accept as a counter-argument this program if you can show a runtime failure.

The the first read of the counter variable is outside a mutex lock
correct? The read is followed by a 0 comparison, correct?

What guarantees that between the read and the comparison the value of
the counter variable hasn't been modified by another thread? The
thread currently doing the comparison cannot guarantee it: it hasn't
locked the mutex. Other threads may be running - actually they
probably are. Another thread may have already acquired the mutex and
incremented the value of counter. Your thread has no way of knowing if
that has happened, because it does not yet have exclusive access to
the counter variable. It will, after it acquires the mutex.

Where is it reading the variable  from? A register? Is it declared
volatile? L2 cache? L3 cache?

The program, as you wrote it, implicitly acknowledges that it is not
thread safe. That is the point of the double check: one before the
mutex lock, and one after it. The point of the first check has nothing
to do with thread-safety, and everything to do with a minor
optimization: if the value stored in variable counter is already not
zero, then there's no point in locking the mutex or performing the
increment.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 09/18/12 18:24, Stefan Teleman wrote:

On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote:


I will concede that I might be wrong and I am open to arguments. I would
accept as a counter-argument this program if you can show a runtime failure.


The the first read of the counter variable is outside a mutex lock
correct? The read is followed by a 0 comparison, correct?


Correct.



What guarantees that between the read and the comparison the value of
the counter variable hasn't been modified by another thread? The
thread currently doing the comparison cannot guarantee it: it hasn't
locked the mutex. Other threads may be running - actually they
probably are. Another thread may have already acquired the mutex and
incremented the value of counter. Your thread has no way of knowing if
that has happened, because it does not yet have exclusive access to
the counter variable. It will, after it acquires the mutex.


It does not matter if the value is changed concurrently in between the 
reading of it and the actual comparison. The value can only be 0, in 
which case it takes the branch that locks the mutex and the next read 
will absolutely see a fresh value; or it can be 1 in which case it has 
already been initialized and this thread does not pay the penalty of a 
lock anymore.




Where is it reading the variable  from? A register? Is it declared
volatile? L2 cache? L3 cache?


Irrelevant. If the value has not been changed it can be read from either 
the cache or directly from the memory. If the value has been changed 
concurrently in between the reading and the actual comparison, the 
processor sees either a stale value of 0, or the new value 1. It is 
guaranteed that any thread reading the value after the (one and only) 
unlock will see a fresh value.




The program, as you wrote it, implicitly acknowledges that it is not
thread safe. That is the point of the double check: one before the
mutex lock, and one after it.


That is a misstatement of the program intentions. See below.


The point of the first check has nothing
to do with thread-safety, and everything to do with a minor
optimization: if the value stored in variable counter is already not
zero, then there's no point in locking the mutex or performing the
increment.


The first check is indeed an optimization. It is the point of this 
exercise to show that the unguarded reads in the localization library 
are not defects and the code, simplified in my test case, is thread safe 
in exactly the respects I mentioned before: the program only observes 
consistent, correct values (program states) in a concurrent environment.


HTH.

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 9/18/12 7:04 PM, Stefan Teleman wrote:

On Tue, Sep 18, 2012 at 6:42 PM, Liviu Nicoara nikko...@hates.ms wrote:


The first check is indeed an optimization. It is the point of this exercise
to show that the unguarded reads in the localization library are not defects
and the code, simplified in my test case, is thread safe in exactly the
respects I mentioned before: the program only observes consistent, correct
values (program states) in a concurrent environment.


The Intel Thread Analyzer shows a data race in the exact same spot the
SunPro Thread Analyzer shows a data race, and in the exact same spot I
have stated earlier that there is a data race.

The results are available here:

http://s247136804.onlinehome.us/liviu-mt-test/22.mt.test.r000ti3.inspxez

As usual, it is an Intel Inspector XE 2003 Analysis Results Blob. It
opens with the Intel Inspector. i don't know what format it is in, or
whether or not it can be read by other means.

So far we have SunPro 12.3/Linux/Intel, SunPro 12.3/Solaris/Intel,
SunPro 12.3/Solaris/SPARC and Intel 2003/Linux/Intel saying that there
is a race condition in your program.

And now I would like to get back to solving the thread-unsafety and
race conditions problems in Apache stdcxx.


Stefan, you cannot ignore others' arguments and embrace a 'no' attitude; the 
tools are not saying what you think they do. They all show the same thing 
because they all do the exact same analysis and we could delve into that, too. I 
sincerely hoped you could come up with some sort of a logical argument that 
would prove my latest test case wrong, or at least show it failing at runtime.


My time is also limited and I have already spent more time than I wanted to 
trying to present a logical argument, down to the simplest test cases. I say we 
defer this to Martin for when he will be back from vacation, next week. For the 
record, because people may not have the energy to read all this back and forth, 
I will summarize here the gist of this thread:


1. The facet data caching is not MT-safe
2. The facet data initialization (STDCXX or system locales) is safe (*)
3. There is no unit test currently showing a failure in (2)
4. Timing results show that caching may be slower than non-caching in MT builds
5. A fix should, ideally, be binary compatible
6. A fix should, ideally, preserve performance or increase it
7. There is one patch, currently attached to the issue, by Stefan
8. Other partial patches are referenced from this thread

Please correct me if I missed anything. The above summary is a good starting 
point for a new thread dealing with this issue.


HTH.

Liviu

(*) When perfectly forwarding, with a theoretical concern I asked in an earlier 
post, directed at Martin.




Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman
On Tue, Sep 18, 2012 at 7:38 PM, Liviu Nicoara nikko...@hates.ms wrote:

 1. The facet data caching is not MT-safe
 2. The facet data initialization (STDCXX or system locales) is safe (*)
 3. There is no unit test currently showing a failure in (2)
 4. Timing results show that caching may be slower than non-caching in MT
 builds
 5. A fix should, ideally, be binary compatible
 6. A fix should, ideally, preserve performance or increase it
 7. There is one patch, currently attached to the issue, by Stefan
 8. Other partial patches are referenced from this thread

 Please correct me if I missed anything. The above summary is a good starting

For the record, I fundamentally disagree with your assessment above.
It is not based on any verifiable and reproducible facts, analysis of
facts and/or measurements. It is based on assertions, which are, in
turn, based on other assertions.

As long as you continue dismissinsg the results from 4 [ FOUR ]
different thread analyzer, on 4 [ FOUR ] different operating systems,
and that solely on the basis on your assertions and beliefs, there is
no point in continuing this debate.

The results from all four thread analyzers contradict to all of your
assertions. If you firmly and strongly believe that you are always
right, and that the four thread analyzers are always wrong, that is
your choice.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Stefan Teleman
On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote:

 In the meantime I would like to stress again that __rw_get_numpunct is
 perfectly thread-safe and does not need extra locking for perfect
 forwarding.

So, by removing the test for

  if (!(_C_flags  _RW::__rw_gr))

(or any other bitmask for that matter), the functions which were
thread-unsafe - and were exhibiting all the symptoms of a run-time
race condition -, magically became thread-safe?

I have looked *extensively* at the code in __rw_get_numpunct. It is
inherently thread-unsafe.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Liviu Nicoara

On 09/17/12 09:51, Stefan Teleman wrote:

On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote:


In the meantime I would like to stress again that __rw_get_numpunct is
perfectly thread-safe and does not need extra locking for perfect
forwarding.


So, by removing the test for

   if (!(_C_flags  _RW::__rw_gr))

(or any other bitmask for that matter), the functions which were
thread-unsafe - and were exhibiting all the symptoms of a run-time
race condition -, magically became thread-safe?

I have looked *extensively* at the code in __rw_get_numpunct. It is
inherently thread-unsafe.


I mean to say that no extra locking is necessary when the public interface 
forwards and no caching is done.

In more detail: __rw_get_numpunct code is entered upon a call from the 
protected virtual interface. It calls facet _C_data member function which 
either returns objects constructed from previously-initialized POD data (in 
which case no locking is necessary), or it attempts to first initialize the 
facet data from the STDCXX database.

If the latter, the execution goes in the facet data initialization code which 
is appropriately synchronized, see this stack trace:

(gdb) bt
#0  __rw::__rw_facet::_C_get_data (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/src/facet.cpp:179
#1  0x0043788b in __rw::__rw_facet::_C_data (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_facet.h:194
#2  0x0044874f in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:80
#3  0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578
#4  0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99
#5  0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190
#6  0x004036b8 in main (argc=2, argv=0x7fffe018) at 
srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29
(gdb)

When the above fails, the facet data has not been initialized, e.g., when there 
is no STDCXX locale database, or the name of the locale does not refer to a 
locale in STDCXX database. The __rw_get_numpunct function will attempt next to 
use libc and system locales, via the __rw_setlocale class, which again is 
properly synchronized and the scope of that lock extends to cover the facet 
data initialization. See this stack trace:

(gdb) bt
#0  __rw::__rw_setlocale::__rw_setlocale (this=0x7fffdd30, locname=0x6e112a 
en_US.utf8, cat=6, nothrow=0) at 
srcdir/stdcxx/branches/4.2.x/src/setlocale.cpp:84
#1  0x00448974 in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:134
#2  0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578
#3  0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99
#4  0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190
#5  0x004036b8 in main (argc=2, argv=0x7fffe018) at 
srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29

I hope you agree that this synchronization is sufficient for the facet 
initialization and reading of facet data.

Thanks.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Stefan Teleman
On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:

 I hope you agree that this synchronization is sufficient for the facet
 initialization and reading of facet data.

Sorry, I do not agree. Two different thread analyzers from two
different compilers written by two different compiler writers tell me
not to.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Liviu Nicoara

On 09/17/12 11:21, Stefan Teleman wrote:

On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:


I hope you agree that this synchronization is sufficient for the facet
initialization and reading of facet data.


Sorry, I do not agree. Two different thread analyzers from two
different compilers written by two different compiler writers tell me
not to.


Stefan, that is indeed your prerogative. However, please keep in mind that 
tools may be buggy or may have limitations beyond what is advertised. I do not 
ask you to have faith in my analysis, which would be absurd, but to look for 
yourself, exercise due diligence in testing the code and draw your own 
conclusions.

Thanks,
Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Wojciech Meyer
Liviu Nicoara nikko...@hates.ms writes:

 On 09/17/12 11:21, Stefan Teleman wrote:
 On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:

 I hope you agree that this synchronization is sufficient for the facet
 initialization and reading of facet data.

 Sorry, I do not agree. Two different thread analyzers from two
 different compilers written by two different compiler writers tell me
 not to.

 Stefan, that is indeed your prerogative. However, please keep in mind
 that tools may be buggy or may have limitations beyond what is
 advertised. I do not ask you to have faith in my analysis, which would
 be absurd, but to look for yourself, exercise due diligence in testing
 the code and draw your own conclusions.

so which compilers do fail? You know, some of them might use the same
component.

-- 
Wojciech



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Stefan Teleman
On Mon, Sep 17, 2012 at 11:38 AM, Wojciech Meyer wojciech.me...@arm.com wrote:

 so which compilers do fail? You know, some of them might use the same
 component.

Intel Compiler/Thread Analyzer on Linux, SunPro Compiler/Thread
Analyzer on Linux and Solaris (Intel and SPARC). All three of them
show the same exact problems.

The Intel Compilers and the SunPro Compilers have nothing in common
with each other.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-16 Thread Stefan Teleman
On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Now, to clear the confusion I created: the timing numbers I posted in the
 attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
 perfectly forwarding, no caching public interface (exemplified by a changed
 grouping) performs better than the current implementation. It was that test
 case that I hoped you could time, perhaps on SPARC, in both MT and ST
 builds. The t.cpp program is for MT, s.cpp for ST.

I got your patch, and have tested it.

I have created two Experiments (that's what they are called) with the
SunPro Performance Analyzer. Both experiments are targeting race
conditions and deadlocks in the instrumented program,  and both
experiments are running the 22.locale.numpunct.mt program from the
stdcxx test harness. One experiment is with  your patch applied. The
other experiment is with our (Solaris) patch applied.

Here are the results:

1. with your patch applied:

http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.nts/

2. with our (Solaris) patch applied:

http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-16 Thread Liviu Nicoara

On 9/16/12 3:20 AM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote:


Now, to clear the confusion I created: the timing numbers I posted in the
attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
perfectly forwarding, no caching public interface (exemplified by a changed
grouping) performs better than the current implementation. It was that test
case that I hoped you could time, perhaps on SPARC, in both MT and ST
builds. The t.cpp program is for MT, s.cpp for ST.


I got your patch, and have tested it.


Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more 
detailed than gprof profiling output.


I am going to update the incident shortly with a more detailed timing 
measurements on my side, in the form of a new attachment. Just FYI in case you 
still don't get notifications.


Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-16 Thread Liviu Nicoara

On 9/16/12 11:21 AM, Liviu Nicoara wrote:

On 9/16/12 3:20 AM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote:


Now, to clear the confusion I created: the timing numbers I posted in the
attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
perfectly forwarding, no caching public interface (exemplified by a changed
grouping) performs better than the current implementation. It was that test
case that I hoped you could time, perhaps on SPARC, in both MT and ST
builds. The t.cpp program is for MT, s.cpp for ST.


I got your patch, and have tested it.


Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more
detailed than gprof profiling output.

I am going to update the incident shortly with a more detailed timing
measurements on my side, in the form of a new attachment. Just FYI in case you
still don't get notifications.


I have attached a new set of results to the incident, in the form of the 
archive:

http://tinyurl.com/9drzg4e

Please see the content for a description of the library changes (_numpunct.h 
file), the MT test program (t.cpp) and the results collected through two 
separate builds on two different machines (results.txt file).


Thanks.

Liviu


STDCXX-1066 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/1/12 1:52 PM, Stefan Teleman wrote: On Sat, Sep 1, 2012 at 12:15 PM, 
 I opened yesterday STDCXX-1066:

 https://issues.apache.org/jira/browse/STDCXX-1056

 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll
 have patches done this weekend. Achtung: the patchset is very large
 and touches a very large number of files. It's strange that I didn't
 get an email about STDCXX-1066.


Hi Stefan,

I have read through the patches attached to the incident, then I briefly read 
about the SunPro pragma align and pack. Two questions:


1. AFAICT, the use of the packing pragma may interfere with a user's setting of 
the same value. I.e., a user sets the packing in their sources and then, 
directly or not, includes an STDCXX header. It seems to me that in such a 
situation, our setting of the packing value would interfere with the rest of the 
user's translation unit, since there is no way to `restore' the previous packing 
value.


Something along the lines of:

// user source file

#pragma pack (X) // X != 8

#include iostream

struct UserDef
{
// different alignment than X ?
// ...
};

Is my understanding correct?

2. The patches are against 4.2.1, but the change would be binary incompatible 
with the already released 4.2.1 branch. Do you plan to have this fix in 4.3.x?


Thanks.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-15 Thread Stefan Teleman
On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote:

 That is funny. What compiler are you using? What does the following test
 case return for you?

It's the Intel compiler with the patched stdcxx for  the wrong case
and GCC 4.7.1 + GNU libstdc++ for  the correct case.

GCC + GNU libstdc++ are correct.

The patched facet does not call the protected do_*() virtual functions
from their public counterparts, as it is required to do by the
Standard. Instead, it returns the data mebers directly (the data
members were initialized in the constructor). That is the patch you
proposed, which is indeed much better performing than using a mutex
lock. Unfortunately, in doing so, overriding the virtual functions in
a derived facet type becomes pointless.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/15/12 1:51 PM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote:


That is funny. What compiler are you using? What does the following test
case return for you?


It's the Intel compiler with the patched stdcxx for  the wrong case
and GCC 4.7.1 + GNU libstdc++ for  the correct case.

GCC + GNU libstdc++ are correct.

The patched facet does not call the protected do_*() virtual functions
from their public counterparts, as it is required to do by the
Standard. Instead, it returns the data mebers directly (the data
members were initialized in the constructor). That is the patch you
proposed, which is indeed much better performing than using a mutex
lock. Unfortunately, in doing so, overriding the virtual functions in
a derived facet type becomes pointless.


Ahh, I see now. You are indeed right, that patch is defective. I was under the 
impression that we were discussing the (later) attachment 
stdcxx-1056-timings.tgz which contains a perfectly forwarding implementation of 
the facet public grouping method. The timings I attached there were the ones I 
thought we were discussing all along.


Now, to clear the confusion I created: the timing numbers I posted in the 
attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a 
perfectly forwarding, no caching public interface (exemplified by a changed 
grouping) performs better than the current implementation. It was that test case 
that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The 
t.cpp program is for MT, s.cpp for ST.


Please let me know if this clarifies things. I apologize for the 
misunderstanding.

Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/15/12 2:57 PM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 1:02 PM, Liviu Nicoara nikko...@hates.ms wrote:


I have read through the patches attached to the incident, then I briefly
read about the SunPro pragma align and pack. Two questions:

1. AFAICT, the use of the packing pragma may interfere with a user's setting
of the same value. I.e., a user sets the packing in their sources and then,
directly or not, includes an STDCXX header. It seems to me that in such a
situation, our setting of the packing value would interfere with the rest of
the user's translation unit, since there is no way to `restore' the previous
packing value.

Something along the lines of:

// user source file

#pragma pack (X) // X != 8

#include iostream

struct UserDef
{
 // different alignment than X ?
 // ...
};

Is my understanding correct?


Yes, you are indeed correct, Sir. :-)

However, for every #pragma pack(8) there should be a corresponding
subsequent #pragma pack(0). If there isn't, that's a patch bug.
#pragma pack(0) restores the default alignment. So, there should be
(there *must* be) no silent packing side-effects in user programs.


Yes, but it restores the default packing, not an arbitrary one, potentially set 
by the user prior to including our headers. Say, the user sets 2, the default is 
4 and we set 8. When we set it to default it goes back to 4, instead of the 
expected 2. Did I get this right?





2. The patches are against 4.2.1, but the change would be binary
incompatible with the already released 4.2.1 branch. Do you plan to have
this fix in 4.3.x?


Yes, definitely for 4.2.x and 4.3.x. I sent them for 4.2.1 just
because that's what I have handy.


I see now. I tried to apply them to 4.2.x and some chunks failed, and I thought 
I was not applying them where they were intended to go.


Thanks!

Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/15/12 5:19 PM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 4:57 PM, Liviu Nicoara nikko...@hates.ms wrote:


Yes, but it restores the default packing, not an arbitrary one, potentially
set by the user prior to including our headers. Say, the user sets 2, the
default is 4 and we set 8. When we set it to default it goes back to 4,
instead of the expected 2. Did I get this right?


This is true, but leaving some arbirary pragma pack(N) (for N != 0) in
effect for the duration of a program, and expecting it to work sounds
like a very defective programming approach to me. It will certainly
not work on SPARC at all.  if the program needs to pack something in a
certain specific way, it need to do so for that specific something
only. Otherwise the side-effects of globally setting a non-default
packing will destroy the program anyway.


I merely wanted to point out that restoring the default packing is not the same 
as restoring the packing to the previous value in effect.


Given this, I thought about an alternative way of forcing this alignment, e.g., 
via a union, aligned on an appropriate type. I see an advantage here in that 
most of the changes would occur where we define the 'primitive' mutex and 
condition wrappers, saving a few pre-processor conditionals and pragmas along 
the way.


What do you think?

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-12 Thread Liviu Nicoara

On 09/12/12 00:21, Stefan Teleman wrote:

On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote:


AFAICT, there are two cases to consider:

1. Using STDCXX locale database initializes the __rw_punct_t data in the
first, properly synchronized pass through __rw_get_numpunct. All subsequent
calls use the __rw_punct_t data to construct returned objects.
2. Using the C library locales does the same in the first pass, via
setlocale and localeconv, but setlocale synchronization is via a per-process
lock. The facet data, once initialized is used just like above.

I probably missed this in the previous conversation, but did you detect a
race condition in the tests if the facets are simply forwarding to the
private virtual interface? I.e., did you detect that the facet
initialization code is unsafe? I think the facet __rw_punct_t data is safely
initialized in both cases, it's the caching that is done incorrectly.


I originally thought so too, but now I'm having doubts. :-) And I
haven't tracked it down with 100% accuracy yet. I saw today this
comment in src/facet.cpp, line 358:

// a per-process array of facet pointers sufficiently large
// to hold (pointers to) all standard facets for 8 locales
static __rw_facet*  std_facet_buf [__rw_facet::_C_last_type * 8];


The localization code is cleverly dense and low-level. There are a few patterns 
that are used throughout, though. One is the function-local static buffers, 
e.g., the locales buffer and the facets buffer, in locale_body.cpp and 
facet.cpp, respectively. They both start from a reasonable size and are grown 
and shrunk according to the needs, but neither locales nor facets are ever 
evicted from them. See:

locale_body:936  -- reduction of locale buffer
locale_body:1006 -- expansion of locale buffer

facet.cpp:397-- reduction of facet buffer
facet.cpp:462-- expansion of facet buffer

Facets buffer is guarded by a static guard in:

__rw_facet::_C_manage:366

The locales buffer is protected by a static guard in:

__rw_locale::_C_manage:880




[...] this is all investigative stuff for tomorrow. :-)

and I agree with Martin that breaking ABI in a minor release is really
not an option. I'm trying to find the best way of making these facets
thread-safe while inflicting the least horrible performance hit.


We are getting close to the point where we can summarize all these findings and 
gauge an appropriate course of action.



I will run your tests tomorrow and let you know. :-)



That would be very helpful!

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-11 Thread Liviu Nicoara

On 9/11/12 9:40 PM, Martin Sebor wrote:

On 09/11/2012 04:15 PM, Stefan Teleman wrote:

On Mon, Sep 10, 2012 at 4:24 PM, Stefan Teleman

I think I have something which doesn't break BC - stay tuned because
I'm testing it now.


OK.

So, here's a possible implementation of __rw_get_numpunct() with
minimal locking, which passes the MT tests and does not break ABI:

s247136804.onlinehome.us/stdcxx-1056-20120911/punct.cpp

And the same for include/loc/_numpunct.h:

http://s247136804.onlinehome.us/stdcxx-1056-20120911/_numpunct.h

In _numpunct.h, all the functions perform no checks and no lazy
initialization. They function simply as a pass-through to
__rw_get_numpunct(). std::numpunctT's data members are now dead
varaiables.

The bad: performance is no better than with locking the mutex inside
each of the std::numpunctT::*() functions, and with lazy
instantiation.


I wouldn't expect this to be faster than the original. In fact,
I would expect it to be slower because each call to one of the
public, non-virtual members results in a call to the out-of-line
virtual functions (and another to __rw_get_moneypunct). Avoiding
the overhead of such calls is the main and only reason why the
caching exists.



AFAICT, there are two cases to consider:

1. Using STDCXX locale database initializes the __rw_punct_t data in the first, 
properly synchronized pass through __rw_get_numpunct. All subsequent calls use 
the __rw_punct_t data to construct returned objects.
2. Using the C library locales does the same in the first pass, via setlocale 
and localeconv, but setlocale synchronization is via a per-process lock. The 
facet data, once initialized is used just like above.


I probably missed this in the previous conversation, but did you detect a race 
condition in the tests if the facets are simply forwarding to the private 
virtual interface? I.e., did you detect that the facet initialization code is 
unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, 
it's the caching that is done incorrectly.



I'm afraid unoptimized timings don't tell us much. Neither does
a comparison between two compilers, even on the same OS.

I looked at Liviu's timings today. I was puzzled by the difference
between (1) which, IIUC, is the current implementation (presumably
an optimized, thread-safe build with the same compiler and OS) and
(4), which, again IIUC, is the equivalent of your latest patch here
(again, presumably optimized, thread safe, same compiler/OS). I'm
having trouble envisioning how calling a virtual function to
retrieve the value of grouping can possibly be faster than not
calling it (and simply returning the value cached in the data
member of the facet.



The new results I attached to the issue come from a a bit clearer tests and they 
focus on just two cases: the current implementation vs. a non-caching one; the 
latter just forwards the grouping calls to the protected do_grouping, with _no_ 
other changes to the implementation.


The timing numbers seem to show that MT builds fare far worse with the caching 
than without. Stefan, if you have the time, could you please infirm :) my 
conclusions by timing it on one of your machines?


Thanks,

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-11 Thread Stefan Teleman
On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote:

 AFAICT, there are two cases to consider:

 1. Using STDCXX locale database initializes the __rw_punct_t data in the
 first, properly synchronized pass through __rw_get_numpunct. All subsequent
 calls use the __rw_punct_t data to construct returned objects.
 2. Using the C library locales does the same in the first pass, via
 setlocale and localeconv, but setlocale synchronization is via a per-process
 lock. The facet data, once initialized is used just like above.

 I probably missed this in the previous conversation, but did you detect a
 race condition in the tests if the facets are simply forwarding to the
 private virtual interface? I.e., did you detect that the facet
 initialization code is unsafe? I think the facet __rw_punct_t data is safely
 initialized in both cases, it's the caching that is done incorrectly.

I originally thought so too, but now I'm having doubts. :-) And I
haven't tracked it down with 100% accuracy yet. I saw today this
comment in src/facet.cpp, line 358:

// a per-process array of facet pointers sufficiently large
// to hold (pointers to) all standard facets for 8 locales
static __rw_facet*  std_facet_buf [__rw_facet::_C_last_type * 8];

this leads me to suspect that there is an upper limit of 8 locales +
their standard facets. If the locales (and their facets) are being
recycled in and out of this 8-limit cache, that would explain the
other thing I also noticed (which also answers your question): yes, i
have gotten the dreaded strcmp(3C) 'Assertion failed' in
22.locale.numpunct.mt when I test implemented 22.locale.numpunct.mt in
a similar way to your tests. which in theory shouldn't happen, but it
did. which means that there's something going on with
behind-the-scenes facet re-initialization that i haven't found yet.
which would partially explain your observation that MT-tests perform
much worse with caching than without.

this is all investigative stuff for tomorrow. :-)

and I agree with Martin that breaking ABI in a minor release is really
not an option. I'm trying to find the best way of making these facets
thread-safe while inflicting the least horrible performance hit.

i will run your tests tomorrow and let you know. :-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Stefan Teleman
On Thu, Sep 6, 2012 at 6:43 PM, Stefan Teleman stefan.tele...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 4:02 PM, Martin Sebor mse...@gmail.com wrote:

 Here's a thought: it's not pretty but how about having
 the function initialize the facet? It already has a pointer
 to the base class, so it could downcast it to std::numpunct
 (or moneypunct, respectively), and assign the initial values
 to the members. Would that work?

 I haven't looked at them in detail (yet) but a cursory look shows that
 they're both recursive for the successful case.

It's not going to work that way.

For one, __rw_get_numpunct() and __rw_get_moneypunct() are static
functions in the __rw namespace. Neither can access or modify the
std::numpunctT or std::moneypunctT data members directly, because
they are private.

Second, both __rw_get_numpunct() and __rw_get_moneypunct() are
recursive. Unless we want to start playing with
PTHREAD_MUTEX_RECURSIVE, which I'm not at all sure is supported on all
the platforms we support, we're not going to be able to solve the
thread-safety problem here (Linux supports it as
PTHREAD_MUTEX_RECURSIVE_NP, Solaris supports it
PTHREAD_MUTEX_RECURSIVE).

Third, both __rw_get_numpunct() and __rw_get_moneypunct() can return a
NULL pointer. This is bad, because it will cause a SEGV at string
assignment, during a call to either of the
std::numpunctT::grouping(), std::numpunctT::truename(), etc.
functions. We should fix this and throw an exception instead. The
Standard doesn't say that any of these functions can throw, but it
doesn't say they can't throw either. And both __rw_get_numpunct() and
__rw_get_moneypunct() throw already.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Martin Sebor

On 09/10/2012 10:56 AM, Stefan Teleman wrote:

On Thu, Sep 6, 2012 at 6:43 PM, Stefan Telemanstefan.tele...@gmail.com  wrote:

On Thu, Sep 6, 2012 at 4:02 PM, Martin Sebormse...@gmail.com  wrote:


Here's a thought: it's not pretty but how about having
the function initialize the facet? It already has a pointer
to the base class, so it could downcast it to std::numpunct
(or moneypunct, respectively), and assign the initial values
to the members. Would that work?


I haven't looked at them in detail (yet) but a cursory look shows that
they're both recursive for the successful case.


It's not going to work that way.

For one, __rw_get_numpunct() and __rw_get_moneypunct() are static
functions in the __rw namespace. Neither can access or modify the
std::numpunctT  or std::moneypunctT  data members directly, because
they are private.


There are ways around it -- see, for example, the hack in
src/access.h. This gives us, the implementation, a way to
cleanly access private data without exposing them to user
programs. In our case, we could solve our problem by having
the facet declare some helper (defined only in punct.cpp)
a friend and invoking the helper from __rw_get_numpunct.

That said, I'd certainly prefer to avoid hacks as much as
possible. This problem could perhaps more cleanly be solved
by having the facet pass a reference to the string (or to
all of its internal data) to modify to the function (or
something like that). Unfortunately, it would break binary
compatibility.



Second, both __rw_get_numpunct() and __rw_get_moneypunct() are
recursive. Unless we want to start playing with
PTHREAD_MUTEX_RECURSIVE, which I'm not at all sure is supported on all
the platforms we support, we're not going to be able to solve the
thread-safety problem here (Linux supports it as
PTHREAD_MUTEX_RECURSIVE_NP, Solaris supports it
PTHREAD_MUTEX_RECURSIVE).


If I remember correctly, the recursion is quite simple. The
first stage (pfacet-_C_data () != 0) sets up the internal
data and takes place just once per facet object. The second
(recursive) stage then extracts and returns it. I only looked
at it briefly last week but from what I saw I'd say it should
be possible to lock only the second, non-recursive stage and
do the assignment there.



Third, both __rw_get_numpunct() and __rw_get_moneypunct() can return a
NULL pointer. This is bad, because it will cause a SEGV at string
assignment, during a call to either of the
std::numpunctT::grouping(), std::numpunctT::truename(), etc.
functions. We should fix this and throw an exception instead. The
Standard doesn't say that any of these functions can throw, but it
doesn't say they can't throw either. And both __rw_get_numpunct() and
__rw_get_moneypunct() throw already.


The functions only return NULL on internal error (i.e., a bug
in the implementation). There's an assertion right above the
return statement that will fire in a debug build. There's no
point in throwing an exception to the user for our own bugs
(they would have no way to recover from it). It's better to
abort. We might want to do that (i.e., abort) unconditionally
in this case, even when assertions are disabled.

Martin



--Stefan





Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Stefan Teleman
On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote:

 4. Without caching of grouping values, grouping() delegates always to
 do_grouping():

 real0m5.668s
 user1m11.389s
 sys 0m3.952s

FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(),
grouping(), truename(), falsename() must return their do_*()
counterparts.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Liviu Nicoara

On 09/10/12 15:01, Stefan Teleman wrote:

On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote:


4. Without caching of grouping values, grouping() delegates always to
do_grouping():

real0m5.668s
user1m11.389s
sys 0m3.952s


FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(),
grouping(), truename(), falsename() must return their do_*()
counterparts.


Yes, of course. :-) I just meant to say that is _always_ delegating to 
do_grouping(), i.e., does no caching.

Liviu



--Stefan




--
And now I see with eye serene
The very pulse of the machine.


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Stefan Teleman
On Mon, Sep 10, 2012 at 1:32 PM, Martin Sebor mse...@gmail.com wrote:


 That said, I'd certainly prefer to avoid hacks as much as
 possible. This problem could perhaps more cleanly be solved
 by having the facet pass a reference to the string (or to
 all of its internal data) to modify to the function (or
 something like that). Unfortunately, it would break binary
 compatibility.

I think I have something which doesn't break BC - stay tuned because
I'm testing it now.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-07 Thread Liviu Nicoara

On 09/06/12 19:54, Martin Sebor wrote:

I'm not sure how easily we can do that. Almost all of locale
is initialized lazily. Some of the layers might depend on the
facets being initialized lazily as well. This was a deliberate
design choice. One of the constraints was to avoid dynamic
initialization or allocation at startup. [...]


There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch.


Out of curiosity, does it still work if you move the locale into
the thread function? Like this:


I created the test case because I wanted something more specific and with more 
predictable results. Initially, the locale object was created in the thread and 
it was passing.




Also, does the 27.objects test pass with this patch?


Will try. I don't think that we can actually use this patch, I bundled it with 
the test so that people can test the approach right away.



I don't know if we have tests for it but writing to cerr/cout
in a bad_alloc handler should succeed. I.e., this should not
abort:

   try {
   struct rlimit rlim = { };
   getrlimit (RLIMIT_DATA, rlim);
   rlim.rlim_cur = 0;
   setrlimit (RLIMIT_DATA, rlim);

   for ( ; ; ) new int;
   }
   catch (bad_alloc) {
   cout  caught bad alloc:  123  '\n';
   }


I see..

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-07 Thread Liviu Nicoara

On 09/06/12 22:58, Stefan Teleman wrote:

On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote:


There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch.


With your patches, the performance is much much better:


I didn't think about timing it, thanks! The result is somewhat expected without 
the synchronization. However, the full construction-time initialization is 
inappropriate if you take in consideration the delegation from the public API 
to the protected virtual API of the facet class, and the possibility of 
overrides in subclasses. Also, lazy initialization follows the principle of 
economy of effort, and caching increases efficiency. If possible in a robust 
way, I would rather have those two. :-)

Liviu



# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  6 2012, 20:50:13
# COMMENT: thread safety


# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 1035.05
user 1191.76
sys 63.49

--Stefan




--
And now I see with eye serene
The very pulse of the machine.


dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]

2012-09-07 Thread Liviu Nicoara

On 09/07/12 11:58, Stefan Teleman wrote:

On Fri, Sep 7, 2012 at 8:52 AM, Liviu Nicoara nikko...@hates.ms wrote:

On 09/06/12 19:54, Martin Sebor wrote:



Also, does the 27.objects test pass with this patch?



No, it does not. It hangs at the first insertion, line 227. Unfortunately, I
cannot debug it because dbx does not work properly in my installation.


What are the symptoms with dbx mishbehaving? (maybe I can help).


Thanks!

I get this when launching the debugger:

$ dbx -xexec32 t
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading t
Reading ld-linux.so.2
dbx: fetch at 0xf400 failed -- Input/output error
dbx: warning: could not put in breakpoint
dbx: warning: internal handler (-80) made defunct -- could not enable event BPT
dbx: warning: internal handler (-86) made defunct -- could not enable event BPT
...

Then the program just runs from the get go. If I set breakpoints and rerun, it 
ignores them.

Also, long running programs cannot be broken into with C-c:

^Cdbx: warning: Interrupt ignored but forwarded to child.
signal INT in (unknown) at 0xf77cf6c4
0xf77cf6c4: movl %edx,%eax
dbx: warning: 'stop' ignored -- while doing rtld handshake
terminating signal 2 SIGINT
dbx: warning: 'stop' ignored -- while doing rtld handshake
(dbx) where
dbx: program is not active
(dbx)

Liviu


Re: dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]

2012-09-07 Thread Stefan Teleman
On Fri, Sep 7, 2012 at 12:23 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I get this when launching the debugger:

 $ dbx -xexec32 t
 For information about new features see `help changes'
 To remove this message, put `dbxenv suppress_startup_message 7.9' in your
 .dbxrc
 Reading t
 Reading ld-linux.so.2
 dbx: fetch at 0xf400 failed -- Input/output error
 dbx: warning: could not put in breakpoint

There is fix for this, but for Studio 12.2:

http://wesunsolve.net/bugid/id/6545393

It had priority 5 (very low) so that leads me to assume that it hasn't
been fixed yet in 12.3. But I will ask at work about 12.3/Linux.

Strangely enough, I don't get the error on Fedora 17 with 12.3.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Liviu Nicoara

On 09/05/12 23:51, Martin Sebor wrote:

On 09/05/2012 09:03 PM, Stefan Teleman wrote:

[...]
Agreed.

But: if the choice is between an implementation which [1] breaks ABI
and [2] performs 20% worse -- even in contrived test cases -- than
another implementation [2] which doesn't break ABI, and performs
better than the first one,  why would we even consider the first one?


Breaking the ABI is not an option (unless we rev the version).
But I'm not sure I understand what you think breaks the ABI.


I think Stefan is referring to adding a mutex member variable to the facet in 
question and breaking binary compatibility. If that is the case I have confused 
things when I suggested exactly that, earlier. A cursory read through the 
__rw_facet source shows that inherits from __rw_synchronized in MT builds, 
therefore each facet carries its own mutex member.

Liviu


We don't need to add a new mutex -- we can use the __rw_facet
member for the locking. Or did you mean something else?

Martin



--Stefan




--
And now I see with eye serene
The very pulse of the machine.


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Stefan Teleman
On Thu, Sep 6, 2012 at 9:16 AM, Liviu Nicoara nikko...@hates.ms wrote:

 I think Stefan is referring to adding a mutex member variable to the facet
 in question and breaking binary compatibility. If that is the case I have
 confused things when I suggested exactly that, earlier. A cursory read
 through the __rw_facet source shows that inherits from __rw_synchronized in
 MT builds, therefore each facet carries its own mutex member.

 On 09/05/12 23:51, Martin Sebor wrote:
 We don't need to add a new mutex -- we can use the __rw_facet
 member for the locking. Or did you mean something else?

A possible implementation using the __rw_facet mutex could look like this:

template class _CharT
inline string numpunct_CharT::grouping () const
{
if (!(_C_flags  _RW::__rw_gr)) {

numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

_RWSTD_MT_GUARD (__self-_C_mutex);

if (!(_C_flags  _RW::__rw_gr)) {

// [try to] get the grouping first (may throw)
// then set a flag to avoid future initializations
__self-_C_grouping  = do_grouping ();
__self-_C_flags|= _RW::__rw_gr;

}
}

return _C_grouping;
}

Except that it will not work. Because the __rw_facet mutex member is
being locked  in file ../src/facet.cpp in function
__rw_facet::_C_manage at line 366:

// acquire lock
_RWSTD_MT_STATIC_GUARD (_RW::__rw_facet);

This will deadlock because this is the mutex already locked by
std::numpunctT::grouping().

I've already tested this with 3 compilers, and, it does indeed deadlock.

So yes, I did indeed mean something different. I meant adding another
mutex data member to the numpunct class.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Martin Sebor

On 09/06/2012 09:58 AM, Stefan Teleman wrote:

On Thu, Sep 6, 2012 at 9:16 AM, Liviu Nicoaranikko...@hates.ms  wrote:


I think Stefan is referring to adding a mutex member variable to the facet
in question and breaking binary compatibility. If that is the case I have
confused things when I suggested exactly that, earlier. A cursory read
through the __rw_facet source shows that inherits from __rw_synchronized in
MT builds, therefore each facet carries its own mutex member.



On 09/05/12 23:51, Martin Sebor wrote:

We don't need to add a new mutex -- we can use the __rw_facet
member for the locking. Or did you mean something else?


A possible implementation using the __rw_facet mutex could look like this:

templateclass _CharT
inline string numpunct_CharT::grouping () const
{
 if (!(_C_flags  _RW::__rw_gr)) {

 numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

 _RWSTD_MT_GUARD (__self-_C_mutex);

 if (!(_C_flags  _RW::__rw_gr)) {

 // [try to] get the grouping first (may throw)
 // then set a flag to avoid future initializations
 __self-_C_grouping  = do_grouping ();
 __self-_C_flags|= _RW::__rw_gr;

 }
 }

 return _C_grouping;
}

Except that it will not work. Because the __rw_facet mutex member is
being locked  in file ../src/facet.cpp in function
__rw_facet::_C_manage at line 366:

// acquire lock
_RWSTD_MT_STATIC_GUARD (_RW::__rw_facet);


This locks a different mutex, one unrelated to __rw_facet::_C_mutex.

Look at the implementation of _RWSTD_MT_STATIC_GUARD() in rw/_defs.h:

  #  define _RWSTD_MT_STATIC_GUARD(type) \
typedef _RW::__rw_typetype,__LINE__ _UniqueType;   \
_RWSTD_MT_CLASS_GUARD (_UniqueType)

  #  define _RWSTD_MT_CLASS_GUARD(type)   \
_RWSTD_MT_GUARD (_RW::__rw_get_static_mutex ((type*)0))

and then at __rw_get_static_mutex in rw/_mutex.h. It gets a static
(global) mutex from a mutex factory via template instantiation:

  __rw_static_mutex__rw_facet::_C_mutex;



This will deadlock because this is the mutex already locked by
std::numpunctT::grouping().

I've already tested this with 3 compilers, and, it does indeed deadlock.


Something else must be locking the mutex then. I quickly looked
at __rw_get_punct (and __rw_get_numpunct) in punct.cpp but I
couldn't find any signs of the mutex being locked there. The
only thing I see being locked there is the global C locale. If
it's not the callee it has to be the caller.



So yes, I did indeed mean something different. I meant adding another
mutex data member to the numpunct class.


We can't (and shouldn't need to) add one. We need to be able to
make do with the existing member. That's what it's there for.

Martin



--Stefan





Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Stefan Teleman
On Thu, Sep 6, 2012 at 2:46 PM, Stefan Teleman stefan.tele...@gmail.com wrote:

 [steleman@darthvader][/src/steleman/programming/stdcxx-ss122/stdcxx-4.2.1/build/tests][09/06/2012
 14:40:11][1084] ./22.locale.numpunct.mt --nthreads=2 --nloops=100
 # INFO (S1) (10 lines):
 # TEXT:
 # COMPILER: SunPro, __SUNPRO_CC = 0x5120
 # ENVIRONMENT: i386 running linux (Fedora release 17 (Beefy Miracle)
 (3.5.0-2.fc17.x86_64)) with glibc 2.15
 # FILE: 22.locale.numpunct.mt.cpp
 # COMPILED: Sep  6 2012, 14:38:42
 # COMMENT: thread safety
 

 # CLAUSE: lib.locale.numpunct

 # NOTE (S2) (5 lines):
 # TEXT: executing /usr/bin/locale -a  /tmp/tmpfile-YzXcb9
 # CLAUSE: lib.locale.numpunct
 # FILE: process.cpp
 # LINE: 276

 grouping: _RWSTD_MT_GUARD (__self-_C_mutex): 0xf774913c
 _C_get_data: _RWSTD_MT_GUARD (_C_mutex): 0xf774913c

 [ ... deadlock ... ]

And just to make absolutely sure that this isn't a case of SunPro
being insane, here's the output from the Intel compiler:

[steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/06/2012
15:44:23][1390] ./22.locale.numpunct.mt --nthreads=2 --nloops=100
# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  6 2012, 15:43:29
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

# NOTE (S2) (5 lines):
# TEXT: executing locale -a  /tmp/tmpfile-FoZR0J
# CLAUSE: lib.locale.numpunct
# FILE: process.cpp
# LINE: 276

grouping: _RWSTD_MT_GUARD (__self-_C_mutex): 0xf74c2424
_C_get_data: _RWSTD_MT_GUARD (_C_mutex): 0xf74c2424

[ ... deadlock ... ]

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Liviu Nicoara

On Sep 5, 2012, at 4:03 PM, Martin Sebor wrote:

 On 09/05/2012 01:33 PM, Liviu Nicoara wrote:
 On 09/05/12 15:17, Martin Sebor wrote:
 On 09/05/2012 12:40 PM, Liviu Nicoara wrote:
 On 09/05/12 14:09, Stefan Teleman wrote:
 On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:
 [...]
 OK so I did a little bit of testing, after looking at the *right*
 __rw_guard class. :-)
 
 I changed the std::numpunct class thusly:
 [...]
 
 I am afraid this would be unsafe, too (if I said otherwise earlier I was
 wrong). [...] Thoughts?
 
 You're right, there's still a problem. We didn't get the double
 checked locking quite right.
 
 I wish for simplicity: eliminate the lazy initialization, and fully
 initialize the facet in the constructor. Then we'd need no locking on
 copying its data (std::string takes care of its own copying).
 
 I'm not sure how easily we can do that. Almost all of locale
 is initialized lazily. Some of the layers might depend on the
 facets being initialized lazily as well. This was a deliberate
 design choice. One of the constraints was to avoid dynamic
 initialization or allocation at startup. [...]

There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch. 

Will continue to look into it. Although STDCXX does not have an implementation 
of the atomics library we could probably use the existing, unexposed, atomics 
API to implement a more robust lazy initialization. 

Liviu

$ cat tests/localization/t.cpp; svn diff
#include locale

#include cstdio
#include cstdlib

#include unistd.h

#define MAX_THREADS16
#define MAX_LOOPS1000

static bool hold = true;

extern C {

static void* 
f (void* pv)
{
std::numpunctchar const fac = 
*reinterpret_cast std::numpunctchar*  (pv);

while (hold) ; 

for (int i = 0; i != MAX_LOOPS; ++i) {
std::string const grp = fac.grouping ();
}

return 0;
}

}

int
main (int argc, char** argv)
{
std::locale const loc = std::locale (argv [1]);

std::numpunctchar const fac =
std::use_facetstd::numpunctchar  (loc);

pthread_t tid [MAX_THREADS] = { 0 };

for (int i = 0; i  MAX_THREADS; ++i) {
if (pthread_create (tid + i, 0, f, (void*)fac)) 
exit (-1);
}

sleep (1);

hold = false;

for (int i = 0; i  MAX_THREADS; ++i) {
if (tid [i])
pthread_join (tid [i], 0);
}

return 0;
}

Index: include/loc/_numpunct.h
===
--- include/loc/_numpunct.h (revision 1381575)
+++ include/loc/_numpunct.h (working copy)
@@ -61,24 +61,40 @@ struct numpunct: _RW::__rw_facet
 string_type;
 
 _EXPLICIT numpunct (_RWSTD_SIZE_T __ref = 0)
-: _RW::__rw_facet (__ref), _C_flags (0) { }
+: _RW::__rw_facet (__ref) {
+_C_grouping   = do_grouping ();
+_C_truename   = do_truename ();
+_C_falsename  = do_falsename ();
+_C_decimal_point  = do_decimal_point ();
+_C_thousands_sep  = do_thousands_sep ();
+}
 
 virtual ~numpunct () _RWSTD_ATTRIBUTE_NOTHROW;
 
 // 22.2.3.1.1, p1
-char_type decimal_point () const;
+char_type decimal_point () const {
+return _C_decimal_point;
+}
 
 // 22.2.3.1.1, p2
-char_type thousands_sep () const;
+char_type thousands_sep () const {
+return _C_thousands_sep;
+}
 
 // 22.2.3.1.1, p3
-string grouping () const;
+string grouping () const {
+return _C_grouping;
+}
 
 // 22.2.3.1.1, p4
-string_type truename () const;
+string_type truename () const {
+return _C_truename;
+}
 
 // 22.2.3.1.1, p4
-string_type falsename () const;
+string_type falsename () const {
+return _C_falsename;
+}
 
 static _RW::__rw_facet_id id;
 
@@ -112,15 +128,13 @@ protected:
 
 private:
 
-int _C_flags;   // bitmap of cached data valid flags
-string  _C_grouping;// cached results of virtual members
+string  _C_grouping;
 string_type _C_truename;
 string_type _C_falsename;
 char_type   _C_decimal_point;
 char_type   _C_thousands_sep;
 };
 
-
 #ifndef _RWSTD_NO_SPECIALIZED_FACET_ID
 
 _RWSTD_SPECIALIZED_CLASS
@@ -134,95 +148,6 @@ _RW::__rw_facet_id numpunctwchar_t::id
 #  endif   // _RWSTD_NO_WCHAR_T
 #endif   // _RWSTD_NO_SPECIALIZED_FACET_ID
 
-
-template class _CharT
-inline _TYPENAME numpunct_CharT::char_type
-numpunct_CharT::decimal_point () const
-{
-if (!(_C_flags  _RW::__rw_dp)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the decimal point first (may throw)
-// then set a flag to avoid future 

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Martin Sebor

I'm not sure how easily we can do that. Almost all of locale
is initialized lazily. Some of the layers might depend on the
facets being initialized lazily as well. This was a deliberate
design choice. One of the constraints was to avoid dynamic
initialization or allocation at startup. [...]


There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch.


Out of curiosity, does it still work if you move the locale into
the thread function? Like this:


static void*
f (void* pv)
{
 std::numpunctchar  const  fac =
 *reinterpret_cast  std::numpunctchar*  (pv);

 while (hold) ;

 for (int i = 0; i != MAX_LOOPS; ++i) {


   const std::locale loc (en_US.utf8);

   typedef std::numpunctchar NumPunct;
   NumPunct const fac = std::use_facetNumPunct(loc);


 std::string const grp = fac.grouping ();
 }

 return 0;
}



Also, does the 27.objects test pass with this patch?

I don't know if we have tests for it but writing to cerr/cout
in a bad_alloc handler should succeed. I.e., this should not
abort:

  try {
  struct rlimit rlim = { };
  getrlimit (RLIMIT_DATA, rlim);
  rlim.rlim_cur = 0;
  setrlimit (RLIMIT_DATA, rlim);

  for ( ; ; ) new int;
  }
  catch (bad_alloc) {
  cout  caught bad alloc:  123  '\n';
  }

Martin


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Stefan Teleman
On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote:

 There would be a performance degradation. IMHO, it would be minor and would 
 simplify the code considerably.

 After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
 tried to remove the lazy initialization and a (smaller) test case now passes. 
 I am attaching the program and the numpunct patch.

With your patches, the performance is much much better:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  6 2012, 20:50:13
# COMMENT: thread safety


# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 1035.05
user 1191.76
sys 63.49

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Liviu Nicoara

On 09/05/12 00:51, Stefan Teleman wrote:

On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote:


   template class _CharT
   inline string numpunct_CharT::grouping () const
   {
   if (!(_C_flags  _RW::__rw_gr)) {

   numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

  _RWSTD_MT_GUARD (_C_mutex);

  // [try to] get the grouping first (may throw)
   // then set a flag to avoid future initializations
   __self-_C_grouping  = do_grouping ();
   __self-_C_flags|= _RW::__rw_gr;
   }

   return _C_grouping;
   }


That's what I wanted to do originally - use a per-object mutex.


FWIW, it is pretty obvious to me _now_ that these assignments are not MT-safe, 
by themselves and also in the context of the test guarding them.

You're right, access must be synchronized on a per-facet object basis. Since 
the data that needs to be protected on assignment is instance data, the mutex 
used in the guard must be a (yet to be added) instance mutex. That would make 
it binary incompatible and would go in ... 4.3.x?




This works:

template class _CharT
inline string numpunct_CharT::grouping () const
{
 if (!(_C_flags  _RW::__rw_gr)) {

 numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

 _RWSTD_MT_STATIC_GUARD (_Type);


It seems to me that a static guard is excessive. The only thing that needs 
guarding is the actual assignment to facet instance members -- a mutex instance member 
would suffice.



And even so, this is still not thread-safe:

Two different threads [ T1 and T2 ], seeking two different locales
[en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping()
at the same time - because they are running on two different cores.
They both test for

 if (!(_C_flags  _RW::__rw_gr))

and then -- assuming the expression above evaluates to true -- one of
them wins the mutex [T1], and the other one [T2] blocks on the mutex.

When T1 is done and exits the function, the grouping is set to
en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and
proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running
from T1 who now thinks he got en_US.UTF-8, but instead he gets
ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it
(remember, the std::string grouping _charT buffer is shared by the
caller from T1 and the caller from T2).


I am pretty sure that in your example, they would operate on two different instances of the facet 
class. The static guard would synchronize their running alright but through two 
different facet instances, copying data from two different places of the locale database. In this 
respect a static guard is excessive.

I would defer it to Martin to validate a course of action but to me it looks 
like the only problem is the concurrent assignment to facet instance member 
variables inside the facet member functions. Which could be easily and nicely 
solved with a plain guard on a mutex ordinary member variable.

Thanks!

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Martin Sebor

That's what I wanted to do originally - use a per-object mutext.
Unfortunately the _C_mutex member in rw::__rw_synchronized is static:


That's the wrong __rw_synchronized -- this one is for non-MT builds.
The one we get in MT builds in on line 370.



struct __rw_synchronized
{
 // static so that it takes up no space
 static _RWSTD_EXPORT __rw_mutex _C_mutex;

...

and __rw::rw_guard doesn't have an appropriate constructor.

Intel C++ complains about it too:

/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/include/loc/_numpunct.h(181):
error: no instance of constructor __rw::__rw_guard::__rw_guard
matches the argument list
 argument types are: (const __rw::__rw_mutex)
   _RWSTD_MT_GUARD (_C_mutex);
   ^


That's because the function is const and the guard takes a non-const
reference. This might be a defect in  __rw_synchronized: the mutex
member should probably be mutable.




...

And even so, this is still not thread-safe:

Two different threads [ T1 and T2 ], seeking two different locales
[en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping()
at the same time - because they are running on two different cores.


Those threads won't have the same numpunct facet. A single facet
is always associated with exactly one locale and the relationship
can never change. The locale class guarantees this.


They both test for

 if (!(_C_flags  _RW::__rw_gr))

and then -- assuming the expression above evaluates to true -- one of
them wins the mutex [T1], and the other one [T2] blocks on the mutex.

When T1 is done and exits the function, the grouping is set to
en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and
proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running
from T1 who now thinks he got en_US.UTF-8, but instead he gets
ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it
(remember, the std::string grouping _charT buffer is shared by the
caller from T1 and the caller from T2).

So at a minimum, the locking must happen before evaluating the

 if (!(_C_flags  _RW::__rw_gr))

expression.


You're right that there is a race here. But the race is benign
because the T2 will assign the same value to the string as T1
did (because the grouping must be the same in the same locale).
This doesn't reallocate the string but simply overwrites each
byte with its own value. On all architectures we support this
is atomic and safe.

Martin



This still doesn't solve what ends up being returned in grouping. If
we lock at the top of the function, then, when T2 acquires the mutex,
the test expression will evaluate to false. Therefore T2 will return
whatever is in grouping right now, which happens to be en_US.UTF-8 as
set by T1, when T2 really wanted ja_JP.UTF-8.

I really think the appropriate fix here -- which would address the
performance implications -- is more complex than this. I am thinking
about creating and using a (non-publicly accessible) internal locale
cache:

typedef std::mapstd::string, std::locale  locale_cache;

where all the locales are stored fully initialized, on demand. There
is only one locale instantiation and initialization overhead cost per
locale. After a locale has been instantiated and placed into the
cache, the caller of any specfic locale gets a copy from the cache,
fully instantiated and initialized. But this breaks ABI, so I'm
thinking it's for stdcxx 5.

Thoughts?

--Stefan





Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Stefan Teleman
On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:

 You're right that there is a race here. But the race is benign
 because the T2 will assign the same value to the string as T1
 did (because the grouping must be the same in the same locale).
 This doesn't reallocate the string but simply overwrites each
 byte with its own value. On all architectures we support this
 is atomic and safe.

OK so I did a little bit of testing, after looking at the *right*
__rw_guard class. :-)

I changed the std::numpunct class thusly:

template class _CharT
class numpunct : public _RW::__rw_facet
{
 public:
 // [ ... snip, no changes here ... ]

 private:

int _C_flags;   // bitmap of cached data valid flags
string  _C_grouping;// cached results of virtual members
string_type _C_truename;
string_type _C_falsename;
char_type   _C_decimal_point;
char_type   _C_thousands_sep;
mutable _RW::__rw_mutex  _C_object_mutex; // -
};

And then:

template class _CharT
inline string numpunct_CharT::grouping () const
{
if (!(_C_flags  _RW::__rw_gr)) {

_RWSTD_MT_GUARD (_C_object_mutex);

// double-test here to avoid re-writing an already written string
if (!(_C_flags  _RW::__rw_gr)) {

numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

// [try to] get the grouping first (may throw)
// then set a flag to avoid future initializations
__self-_C_grouping  = do_grouping ();
__self-_C_flags|= _RW::__rw_gr;

}
}

return _C_grouping;
}

same changes for std::numpuncttruename() and std::numpunct::falsename().

Ran 22.locale.numpunct.mt with the default values for nthreads and nloops.

Yesterday's test results, with the static per-class mutex:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED:  Sep  4 2012, 09:11:36
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

# NOTE (S2) (5 lines):
# TEXT: executing locale -a  /tmp/tmpfile-V7siTq
# CLAUSE: lib.locale.numpunct
# FILE: process.cpp
# LINE: 276

# INFO (S1) (3 lines):
# TEXT: testing std::numpunctcharT with 8 threads, 20 iterations
each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8
aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET
aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET
am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE
ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8
ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596
ar_EG.utf8 ar_IN ar_IN.utf8 }
# CLAUSE: lib.locale.numpunct

[ ... ]

# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 2139.31
user 2406.09
sys 155.61

===

Today's test results with the per-object mutex (as shown above):

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  5 2012, 13:08:03
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

# NOTE (S2) (5 lines):
# TEXT: executing locale -a  /tmp/tmpfile-ww0nez
# CLAUSE: lib.locale.numpunct
# FILE: process.cpp
# LINE: 276

# INFO (S1) (3 lines):
# TEXT: testing std::numpunctcharT with 8 threads, 20 iterations
each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8
aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET
aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET
am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE
ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8
ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596
ar_EG.utf8 ar_IN ar_IN.utf8 }
# CLAUSE: lib.locale.numpunct

[ ... ]

# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Liviu Nicoara

On 09/05/12 14:09, Stefan Teleman wrote:

On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:
[...]
OK so I did a little bit of testing, after looking at the *right*
__rw_guard class. :-)

I changed the std::numpunct class thusly:
[...]
And then:

template class _CharT
inline string numpunct_CharT::grouping () const
{
 if (!(_C_flags  _RW::__rw_gr)) {

 _RWSTD_MT_GUARD (_C_object_mutex);

 // double-test here to avoid re-writing an already written string
 if (!(_C_flags  _RW::__rw_gr)) {

 numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

 // [try to] get the grouping first (may throw)
 // then set a flag to avoid future initializations
 __self-_C_grouping  = do_grouping ();
 __self-_C_flags|= _RW::__rw_gr;

 }
 }

 return _C_grouping;
}


I am afraid this would be unsafe, too (if I said otherwise earlier I was 
wrong). The compiler might re-arrange the protected assignments, such that 
another thread sees a partially updated object, where the flags are updated and 
the string not. I don't think we're going to get away with this here without 
either a simpler and more inefficient top-level locking, or doing away 
completely with the lazy initialization. Thoughts?

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Martin Sebor

On 09/05/2012 12:40 PM, Liviu Nicoara wrote:

On 09/05/12 14:09, Stefan Teleman wrote:

On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:
[...]
OK so I did a little bit of testing, after looking at the *right*
__rw_guard class. :-)

I changed the std::numpunct class thusly:
[...]
And then:

template class _CharT
inline string numpunct_CharT::grouping () const
{
if (!(_C_flags  _RW::__rw_gr)) {

_RWSTD_MT_GUARD (_C_object_mutex);

// double-test here to avoid re-writing an already written string
if (!(_C_flags  _RW::__rw_gr)) {

numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

// [try to] get the grouping first (may throw)
// then set a flag to avoid future initializations
__self-_C_grouping = do_grouping ();
__self-_C_flags |= _RW::__rw_gr;

}
}

return _C_grouping;
}


I am afraid this would be unsafe, too (if I said otherwise earlier I was
wrong). The compiler might re-arrange the protected assignments, such
that another thread sees a partially updated object, where the flags are
updated and the string not. I don't think we're going to get away with
this here without either a simpler and more inefficient top-level
locking, or doing away completely with the lazy initialization. Thoughts?


You're right, there's still a problem. We didn't get the double
checked locking quite right. We need to prevent the reordering
both by the compiler and by the hardware so that the reader
doesn't get an out of date value of _C_grouping. Making the
members volatile and/or moving the body of the first if into
an out-of-line function should help with the compiler reordering
but we'll need a barrier to prevent the hardware from serving us
up stale data (i.e., letting T2 see the updated _C_flags but
a _C_grouping that's still being modified). Calling
pthread_mutex_unlock does insert a barrier, but the barrier
is only executed when both tests pass, and not when the first
one fails. We need the barrier in both cases. This seems like
it would get too ugly (and big) to put in an inline function.

Martin



Liviu




Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Martin Sebor

On 09/05/2012 01:33 PM, Liviu Nicoara wrote:

On 09/05/12 15:17, Martin Sebor wrote:

On 09/05/2012 12:40 PM, Liviu Nicoara wrote:

On 09/05/12 14:09, Stefan Teleman wrote:

On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:
[...]
OK so I did a little bit of testing, after looking at the *right*
__rw_guard class. :-)

I changed the std::numpunct class thusly:
[...]


I am afraid this would be unsafe, too (if I said otherwise earlier I was
wrong). [...] Thoughts?


You're right, there's still a problem. We didn't get the double
checked locking quite right.


I wish for simplicity: eliminate the lazy initialization, and fully
initialize the facet in the constructor. Then we'd need no locking on
copying its data (std::string takes care of its own copying).


I'm not sure how easily we can do that. Almost all of locale
is initialized lazily. Some of the layers might depend on the
facets being initialized lazily as well. This was a deliberate
design choice. One of the constraints was to avoid dynamic
initialization or allocation at startup. Another was to be
able to use iostreams (to a limited extent) in low memory
conditions to write an error message to stderr. It's been
a long time and the details are more than a little fuzzy.
I'll need to spend some time going through the code and
refreshing my memory.

Martin



Liviu





Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Pavel Heimlich, a.k.a. hajma
2012/9/5 Stefan Teleman stefan.tele...@gmail.com:
 On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoara nikko...@hates.ms wrote:

 That is good, right?

 Yes, it is good. :-)


 Could not get an Intel build off the ground on the account of the LIMITS.cpp 
 test not running to completion because of a possible compiler bug. I posted 
 earlier an account of it. Do you have a support account that allows posting 
 bug reports for Intel's C++ compiler?

 Oh, yes I remember that now, it happened to me too. I worked around it
 by copying the LIMITS executable from a gcc build.

 I don't have a support contract with Intel - I'm using the free
 compilers at home. I'm guessing Intel must have an open forum for
 Intel developers using their compilers, so that might be a way to
 report the bug. But I haven't looked for it.



 I did not have access to a Solaris box since I left Rogue Wave. Does 
 Sun/Oracle have any program that would allow one to test drive their 
 compiler on a shared box somewhere? I vaguely remember that HP had something 
 like that a while ago.

 I know Sun had such a program. Oracle I don't really know but my
 educated guess is not a chance. ;-)

Gold and above members of the oracle parter network can* get access to
a lab running various Oracle HW and SW.
For regular persons Stefan's answer applies 100%.

However there's nothing stopping you from downloading Solaris 11 iso
or a virtualbox image from
http://www.oracle.com/technetwork/server-storage/solaris11/downloads/index.html
installing it in a virtualization solution of your choice (virtualbox
works best for me) and installing Studio from
http://www.oracle.com/technetwork/server-storage/solarisstudio/downloads/index.html

As long as the installation is used for sw development and not for
production use you do not need to pay Oracle a dime, you just won't
get any updates/patches. (check the licenses fo details, I'm no
lawyer)

P.

* this was announced in April, I have no idea about the current status


 One way of testing with SunPro is to download the free version of
 SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux
 as well). It works very well on OpenSUSE and Ubuntu (tested and used
 on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't
 work at all on Fedora 17 because of the TLS glibc bug (however it used
 to on earlier Fedora releases).

 And now for the latest about 1056:

 I managed (after many, MANY tries) to get a run of
 22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and
 without the thread-safety patches applied, and I hit the jackpot:

 # INFO (S1) (10 lines):
 # TEXT:
 # COMPILER: SunPro, __SUNPRO_CC = 0x5100
 # ENVIRONMENT: sparc-v8 running sunos-5.11
 # FILE: 22.locale.numpunct.mt.cpp
 # COMPILED: Sep  4 2012, 16:10:03
 # COMMENT: thread safety
 

 # CLAUSE: lib.locale.numpunct

 # NOTE (S2) (5 lines):
 # TEXT: executing /usr/bin/locale -a  /var/tmp/tmpfile-MWa4bQ
 # CLAUSE: lib.locale.numpunct
 # FILE: process.cpp
 # LINE: 276

 # INFO (S1) (3 lines):
 # TEXT: testing std::numpunctcharT with 4 threads, 1000 iterations
 each, in 32 locales { C POSIX af_ZA.UTF-8 ar_AE.UTF-8
 ar_BH.UTF-8 ar_DZ.UTF-8 ar_EG.ISO8859-6 ar_EG.UTF-8
 ar_IQ.UTF-8 ar_JO.UTF-8 ar_KW.UTF-8 ar_LY.UTF-8 ar_MA.UTF-8
 ar_OM.UTF-8 ar_QA.UTF-8 ar_SA.UTF-8 ar_TN.UTF-8 ar_YE.UTF-8
 as_IN.UTF-8 az_AZ.UTF-8 be_BY.UTF-8 bg_BG.ISO8859-5
 bg_BG.UTF-8 bn_IN.UTF-8 bs_BA.ISO8859-2 bs_BA.UTF-8
 ca_ES.ISO8859-1 ca_ES.ISO8859-15 ca_ES.UTF-8 cs_CZ.ISO8859-2
 cs_CZ.UTF-8 cs_CZ.UTF-8@euro }
 # CLAUSE: lib.locale.numpunct

 [ ... snip ... ]

 # INFO (S1) (4 lines):
 # TEXT: creating a thread pool with 4 threads
 # CLAUSE: lib.locale.numpunct
 # LINE: 548

 Abort (core dump)

 /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/tests/localization/22.locale.numpunct.mt.cpp:140:
 Assertion '0 == rw_strncmp (grp.c_str (), data.grouping_.c_str ())'
 failed.
 /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/lib/libstdcxx4.so.4.2.1'__1cE__rwQ__rw_assert_fail6Fpkc2i2_v_+0x78
 [0xff249af8]
 /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/tests/22.locale.numpunct.mt'thread_func+0x500
 [0x1b050]
 /lib/libc.so.1'_lwp_start+0x0 [0xf7b553f0]

 Bingo!

 At line 140 in 22.locale.numpunct.mt.cpp, we see:

  RW_ASSERT (0 == rw_strncmp (grp.c_str (),
 data.grouping_.c_str ()));

 What this means:

 Between the time std::string grp was initialized at line 133:

 const std::string grp = np.grouping ();

 and the time we hit line 140 (the RW_ASSERT), another thread modified
 the std::string _C_grouping member of class

 template class _CharT class numpunct : public _RW::__rw_facet.

 And because this is the non-patched version of _numpunct.h, the
 variable std::string grp did not really return a deep copy of
 _C_grouping, but a shared 

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Stefan Teleman
On Wed, Sep 5, 2012 at 4:20 PM, Stefan Teleman stefan.tele...@gmail.com wrote:

 But then there's another aspect  -- which I probably failed to
 highlight in my previous email: the per-object mutex implementation is
 20% *slower* than the class-static mutex implementation.

 class-static implementation:
 real 2139.31
 user 2406.09
 sys 155.61

 pe-object implementation:
 real 2416.75
 user 2694.64
 sys 159.49

The above results are for the Intel compiler. Now here are the results
with GCC 4.7.0:

1. with the static per-class mutex:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: gcc 4.7.0, __VERSION__ = 4.7.0 20120507 (Red Hat 4.7.0-5)
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  5 2012, 06:21:18
# COMMENT: thread safety


[ ... ]

# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 2165.06
user 2428.08
sys 151.30


2. With the per-object mutex:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: gcc 4.7.0, __VERSION__ = 4.7.0 20120507 (Red Hat 4.7.0-5)
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  5 2012, 21:29:56
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

 [ ... ]

# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 2438.70
user 2726.44
sys 155.79

About the same percentage difference as the Intel compiler.

--Stefan

---
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Stefan Teleman
On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote:

 I suspect the difference is due to the overhead of the repeated
 initialization and destruction of the per-object mutex in the
 test. The test repeatedly creates (and discards) named locale
 objects.

 The per-class mutex is initialized just once in the process, no
 matter how many facet objects (how many distinct named locales)
 the test creates. But the per-object mutex must be created (and
 destroyed) for each named locale.

Agreed.

But: if the choice is between an implementation which [1] breaks ABI
and [2] performs 20% worse -- even in contrived test cases -- than
another implementation [2] which doesn't break ABI, and performs
better than the first one,  why would we even consider the first one?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Martin Sebor

On 09/05/2012 09:03 PM, Stefan Teleman wrote:

On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote:


I suspect the difference is due to the overhead of the repeated
initialization and destruction of the per-object mutex in the
test. The test repeatedly creates (and discards) named locale
objects.

The per-class mutex is initialized just once in the process, no
matter how many facet objects (how many distinct named locales)
the test creates. But the per-object mutex must be created (and
destroyed) for each named locale.


Agreed.

But: if the choice is between an implementation which [1] breaks ABI
and [2] performs 20% worse -- even in contrived test cases -- than
another implementation [2] which doesn't break ABI, and performs
better than the first one,  why would we even consider the first one?


Breaking the ABI is not an option (unless we rev the version).
But I'm not sure I understand what you think breaks the ABI.
We don't need to add a new mutex -- we can use the __rw_facet
member for the locking. Or did you mean something else?

Martin



--Stefan





Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Martin Sebor

Incidentally, a benchmark that I would expect to be more
representative of real programs than the test we've been
using is one that creates a small number of named locales
up front and then uses each in a loop in each thread (as
opposed to creating and destroying it in each iteration
and thread). With this setup, the per-object mutex
implementation should be much faster than the one with
the per-class mutex because of the absence of contention.

The test has an option that exercises this setup:
  --shared-locale

Martin

On 09/05/2012 09:51 PM, Martin Sebor wrote:

On 09/05/2012 09:03 PM, Stefan Teleman wrote:

On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote:


I suspect the difference is due to the overhead of the repeated
initialization and destruction of the per-object mutex in the
test. The test repeatedly creates (and discards) named locale
objects.

The per-class mutex is initialized just once in the process, no
matter how many facet objects (how many distinct named locales)
the test creates. But the per-object mutex must be created (and
destroyed) for each named locale.


Agreed.

But: if the choice is between an implementation which [1] breaks ABI
and [2] performs 20% worse -- even in contrived test cases -- than
another implementation [2] which doesn't break ABI, and performs
better than the first one,  why would we even consider the first one?


Breaking the ABI is not an option (unless we rev the version).
But I'm not sure I understand what you think breaks the ABI.
We don't need to add a new mutex -- we can use the __rw_facet
member for the locking. Or did you mean something else?

Martin



--Stefan







Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-04 Thread Liviu Nicoara

On Sep 4, 2012, at 10:34 AM, Stefan Teleman wrote:

 On Tue, Sep 4, 2012 at 7:35 AM, Liviu Nicoara nikko...@hates.ms wrote:
 
 By no means I am dismissing it. It is at the very least an issue of
 efficiency. I will try an Intel C++ build on x86_64 at some point today.
 What build type was that? I also notice that you have 4.2.1 in your path?
 Are you building out of 4.2.1. tag? I built off 4.2.x branch which also has
 support for custom timeouts (--soft-timeout) in rw_test.
 
 Yes, this is 4.2.1 with all the Solaris  patches which can be applied
 on Linux. The environment is:
 
 # INFO (S1) (10 lines):
 # TEXT:
 # COMPILER: Intel C++, __INTEL_COMPILER = 1210,
 __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
 Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
 # FILE: 22.locale.numpunct.mt.cpp
 # COMPILED: Sep  4 2012, 09:11:36
 # COMMENT: thread safety
 
 
 # CLAUSE: lib.locale.numpunct
 
 [ ... snip ... ]
 
 With all the thread-safety Solaris patches for 1056 applied. The test
 runs to completion:
 
 # NOTE (S2) (5 lines):
 # TEXT: executing locale -a  /tmp/tmpfile-Cej8JU
 # CLAUSE: lib.locale.numpunct
 # FILE: process.cpp
 # LINE: 276
 
 # INFO (S1) (3 lines):
 # TEXT: testing std::numpunctcharT with 8 threads, 20 iterations
 each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8
 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET
 aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET
 am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE
 ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8
 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596
 ar_EG.utf8 ar_IN ar_IN.utf8 }
 # CLAUSE: lib.locale.numpunct
 
 [ ... snip ... ]
 
 # +---+--+--+--+
 # | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
 # +---+--+--+--+
 # | (S1) INFO |   11 |   11 |   0% |
 # | (S2) NOTE |1 |1 |   0% |
 # | (S8) ERROR|0 |3 | 100% |
 # | (S9) FATAL|0 |1 | 100% |
 # +---+--+--+--+
 real 2139.31
 user 2406.09
 sys 155.61
 [steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012
 9:50:14][1300]
 
 These tests running to completion (with the patches applied) are 100%
 reproducible on every single run, on Linux and Solaris (Intel and
 SPARC).

That is good, right?

 
 The timings are the output of /usr/bin/time -p.
 
 Yes, ~40 minutes wall clock run time for one test is quite a lot. But
 it runs to completion, with zero errors. And when taking into
 consideration that there are 8 threads with 20 iterations, maybe
 it's not that much. FWIW, SELinux is also fully enabled in my
 environment.


Ok, so that clears (this build at least), maybe inefficient but runs to 
completion with no crashes.


 
 The timeouts can also be a symptom of race conditions/deadlocks or
 attempting to access invalid thread stacks, or partially written data.


I would look at timeouts from the standpoint of the progression of the run: if 
the run does not progress at all, as in threads are deadlocked, then I would 
consider that a defect. If the run progresses, it is a mere inefficiency, 
however bad.


 It's very platform specific. On Solaris SPARC (either 32-bit or
 64-bit) I never get timeouts, I always get either SEGV or SIGBUS. On
 Intel it appears that timeouts are the prevailing symptom.

Could not get an Intel build off the ground on the account of the LIMITS.cpp 
test not running to completion because of a possible compiler bug. I posted 
earlier an account of it. Do you have a support account that allows posting bug 
reports for Intel's C++ compiler?


 
 I am not sure what an explicit run timeout would add, except hiding
 the hang/deadlock/memory corruption problem behind a timeout.


The timeout option allowed me to run the program with a larger value, such that 
the alarm would not trigger and allow the program to run to completion. Before 
enlarging the timeout value, the individual tests would time out.

I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle 
have any program that would allow one to test drive their compiler on a shared 
box somewhere? I vaguely remember that HP had something like that a while ago.

Thanks!

Liviu

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-04 Thread Martin Sebor

I recall discussing this when you opened the defect but I'm not
sure what the outcome of the discussion was.

I looked at the code some more just now and I agree with you that
(at least the numpunct) facet isn't thread safe. The premise was
that we didn't need any locking because the facet never changes
after it's initialized, and the same data member can safely be
assigned multiple times.

The problem is that the string data members must be initialized
before they can be assigned, and assignment to the same member
must be guarded against concurrent accesses, there seems to be
no mechanism that guarantees that this will be true when the
same facet is used from within multiple threads.

For example, the function below tries to avoid re-initialization
of _C_grouping but the |= expression can be reordered either by
the compiler or by the hardware to complete before the assignment
to the member. The function also fails to protect the assignment
to the member from taking place simultaneously in multiple threads.

  template class _CharT
  inline string numpunct_CharT::grouping () const
  {
  if (!(_C_flags  _RW::__rw_gr)) {

  numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

  // [try to] get the grouping first (may throw)
  // then set a flag to avoid future initializations
  __self-_C_grouping  = do_grouping ();
  __self-_C_flags|= _RW::__rw_gr;
  }

  return _C_grouping;
  }

We need a fix but I don't think the one in the patch attached
to the issue is a good solution. Locking all objects of the
template specialization is far too coarse. Even having a lock
per object would kill iostream performance. Creating a deep
copy on return from each of the functions would also slow
things down noticeably.

The efficient solution must avoid locking in the common case
(i.e., after the facet has been initialized). It can avoid
locking around the POD data members altogether since those can
safely be assigned multiple times (on common hardware), but it
needs to ensure that the string data members are initialized
before they are assigned to and are assigned at most once
simultaneously. If we introduce a lock, it must be per object
and not per type.

With that, I would expect the following function to fix the
problem in the one above. Let me know what you think.

  template class _CharT
  inline string numpunct_CharT::grouping () const
  {
  if (!(_C_flags  _RW::__rw_gr)) {

  numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

 _RWSTD_MT_GUARD (_C_mutex);

 // [try to] get the grouping first (may throw)
  // then set a flag to avoid future initializations
  __self-_C_grouping  = do_grouping ();
  __self-_C_flags|= _RW::__rw_gr;
  }

  return _C_grouping;
  }

Martin

On 09/04/2012 07:40 PM, Stefan Teleman wrote:

On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoaranikko...@hates.ms  wrote:


That is good, right?


Yes, it is good. :-)



Could not get an Intel build off the ground on the account of the LIMITS.cpp 
test not running to completion because of a possible compiler bug. I posted 
earlier an account of it. Do you have a support account that allows posting bug 
reports for Intel's C++ compiler?


Oh, yes I remember that now, it happened to me too. I worked around it
by copying the LIMITS executable from a gcc build.

I don't have a support contract with Intel - I'm using the free
compilers at home. I'm guessing Intel must have an open forum for
Intel developers using their compilers, so that might be a way to
report the bug. But I haven't looked for it.




I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle 
have any program that would allow one to test drive their compiler on a shared 
box somewhere? I vaguely remember that HP had something like that a while ago.


I know Sun had such a program. Oracle I don't really know but my
educated guess is not a chance. ;-)

One way of testing with SunPro is to download the free version of
SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux
as well). It works very well on OpenSUSE and Ubuntu (tested and used
on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't
work at all on Fedora 17 because of the TLS glibc bug (however it used
to on earlier Fedora releases).

And now for the latest about 1056:

I managed (after many, MANY tries) to get a run of
22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and
without the thread-safety patches applied, and I hit the jackpot:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: SunPro, __SUNPRO_CC = 0x5100
# ENVIRONMENT: sparc-v8 running sunos-5.11
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  4 2012, 16:10:03
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

# NOTE (S2) (5 lines):
# TEXT: executing /usr/bin/locale -a  /var/tmp/tmpfile-MWa4bQ
# CLAUSE: 

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-04 Thread Stefan Teleman
On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote:

   template class _CharT
   inline string numpunct_CharT::grouping () const
   {
   if (!(_C_flags  _RW::__rw_gr)) {

   numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

  _RWSTD_MT_GUARD (_C_mutex);

  // [try to] get the grouping first (may throw)
   // then set a flag to avoid future initializations
   __self-_C_grouping  = do_grouping ();
   __self-_C_flags|= _RW::__rw_gr;
   }

   return _C_grouping;
   }

That's what I wanted to do originally - use a per-object mutext.
Unfortunately the _C_mutex member in rw::__rw_synchronized is static:

struct __rw_synchronized
{
// static so that it takes up no space
static _RWSTD_EXPORT __rw_mutex _C_mutex;

void _C_lock () { }

void _C_unlock () { }

__rw_guard _C_guard () {
return __rw_guard (_C_mutex);
}

and __rw::rw_guard doesn't have an appropriate constructor.

Intel C++ complains about it too:

/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/include/loc/_numpunct.h(181):
error: no instance of constructor __rw::__rw_guard::__rw_guard
matches the argument list
argument types are: (const __rw::__rw_mutex)
  _RWSTD_MT_GUARD (_C_mutex);
  ^

This works:

template class _CharT
inline string numpunct_CharT::grouping () const
{
if (!(_C_flags  _RW::__rw_gr)) {

numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

_RWSTD_MT_STATIC_GUARD (_Type);

// [try to] get the grouping first (may throw)
// then set a flag to avoid future initializations
__self-_C_grouping  = do_grouping ();
__self-_C_flags|= _RW::__rw_gr;
}

return _C_grouping;
}

Although I'm not sure of the performance implications difference
between _RWSTD_MT_STATIC_GUARD and _RWSTD_MT_CLASS_GUARD for this
particular problem. I'm going with nothing in real life. :-)

And even so, this is still not thread-safe:

Two different threads [ T1 and T2 ], seeking two different locales
[en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping()
at the same time - because they are running on two different cores.
They both test for

if (!(_C_flags  _RW::__rw_gr))

and then -- assuming the expression above evaluates to true -- one of
them wins the mutex [T1], and the other one [T2] blocks on the mutex.

When T1 is done and exits the function, the grouping is set to
en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and
proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running
from T1 who now thinks he got en_US.UTF-8, but instead he gets
ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it
(remember, the std::string grouping _charT buffer is shared by the
caller from T1 and the caller from T2).

So at a minimum, the locking must happen before evaluating the

if (!(_C_flags  _RW::__rw_gr))

expression.

This still doesn't solve what ends up being returned in grouping. If
we lock at the top of the function, then, when T2 acquires the mutex,
the test expression will evaluate to false. Therefore T2 will return
whatever is in grouping right now, which happens to be en_US.UTF-8 as
set by T1, when T2 really wanted ja_JP.UTF-8.

I really think the appropriate fix here -- which would address the
performance implications -- is more complex than this. I am thinking
about creating and using a (non-publicly accessible) internal locale
cache:

typedef std::mapstd::string, std::locale locale_cache;

where all the locales are stored fully initialized, on demand. There
is only one locale instantiation and initialization overhead cost per
locale. After a locale has been instantiated and placed into the
cache, the caller of any specfic locale gets a copy from the cache,
fully instantiated and initialized. But this breaks ABI, so I'm
thinking it's for stdcxx 5.

Thoughts?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-03 Thread Stefan Teleman
On Mon, Sep 3, 2012 at 11:57 PM, Stefan Teleman
stefan.tele...@gmail.com wrote:
 On Mon, Sep 3, 2012 at 3:19 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I tried, unsuccessfully, to reproduce the failure observed by Martin in 
 22.locale.moneypunct.mt, in both debug and optimized, wide and narrow builds 
 on a 16x machine:


 I can reproduce it consistently on Solaris without the patches. It has
 a very high failure rate. It's also been reported at least once
 before, here:

 http://old.nabble.com/22.locale.numpunct.mt-run-hangs-td20133013.html

 I remember there were more hits about 22.locale.numpunct.mt in Google
 when I searched for it a while ago, but I didn't look as closely now.

 I'll create a Solaris build without my patches tomorrow and send the output.

FWIW, my today's builds without the patches with the Intel compiler:

COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403

22.locale.numpunct.mt ran for 3 hours (wall clock time) without ever
completing or doing anything except flatlining the cpu at 115%, on
both 32-bit and 64-bit. This is on:

[steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012
0:21:13][1181] uname -a
Linux darthvader.stefanteleman.org 3.5.0-2.fc17.x86_64 #1 SMP Mon Jul
30 14:48:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

I can't (yet) rebuild with GCC 4.7.0 because Fedora 17's build of GCC
C++ is a mess (libsupc++.a requires TLS but glibc was built with TLS
disabled).

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX forks

2012-09-01 Thread Liviu Nicoara

On 08/31/12 12:20, Stefan Teleman wrote:

On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote:


Stefan's seem like a complete git-ification of the whole Apache repository
but with no changes I could detect.


Not quite. :-)
[...]
The official Oracle port for Solaris 10 and 11, which I maintain, is here:

http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/


Hi Stefan, I have went through the patches. Specifically, I have spent more 
time looking in the mutex alignment changes and the C++ C library headers 
patches, and I only read the others. In order:

The test extensions seem to be genuine by and large, but I would further 
analyze them after I find out what is it they are addressing (test cases?).

The regression tests whose names contain references to internal bug numbers 
require a bit more analysis as to their usefulness. Of course an explanation 
attached to each would alleviate duplicating your work. I have not 
cross-checked them to JIRA.

Some of the compiler characterizations changes, as well as the associated 
GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile, 
GNUmakefile.cfg changes. I may have spotted other issues but I would wait for 
your feed-back first.

The C++ C library headers seem to have been re-written to your port. I am 
unsure why you needed this, but it surely breaks the original intent for these 
headers' structure. I have also noticed that you stripped the Apache notice and 
added an Oracle copyright notice on them.

This pretty much sums up my first impression.

If you were willing to submit them one by one (*), complete with test cases and 
complete patches (ChangeLog entries and all) I would put in the necessary 
review time and provide feed-back asap. Please let me know if I missed anything.

I sincerely hope this helps.

Thanks!

Liviu

 
(*) As in issue by issue, not file by file.


Re: STDCXX forks

2012-09-01 Thread Liviu Nicoara

On 08/31/12 08:58, C. Bergström wrote:

On 08/31/12 07:40 PM, Liviu Nicoara wrote:

Please correct me if I am wrong.

I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and 
the other to Stefan Teleman.

The first one contains a number of genuine code changes outside the Apache 
repository, but without canonical and meaningful commit comments, and without 
accompanying test cases.

We can help clean this up if it makes a real difference

Some carry what seem to be bug id's from a private bug database. Some of the 
changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727

iirc we renamed the files because of some problem with cmake - it was trying to 
compile those files instead of treating them like headers.

, fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd.

I forget why we did this tbh - I'll try to get an answer in case it comes up 
again as a question

Are you guys dropping support for any  platforms?

Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We 
may add OSX/Win to that list soon, but I can't promise at this point.  For 
targets we only are able to test x86 and x86_64


Hi Christopher,

I finished looking though the github changes. If you are interested in 
contributing some of those changes back, as complete patches [1], I would 
volunteer my time to promptly review them and work with you to get them in, as 
needed.

In the meantime, if you will, could you please elaborate on the two changes 
mentioned above? The first one changes the traditional file extension on the 
template implementation files, the second affects the library building on 
platforms like I mentioned in another post. AFAIK STDCXX does not implement or 
follow Boost conventions and/or library file structure.

HTH.

Thanks!

Liviu

[1] http://stdcxx.apache.org/bugs.html#patch_format


Re: STDCXX forks

2012-09-01 Thread Stefan Teleman
On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Hi Stefan, I have went through the patches. Specifically, I have spent more
 time looking in the mutex alignment changes and the C++ C library headers
 patches, and I only read the others. In order:

 The test extensions seem to be genuine by and large, but I would further
 analyze them after I find out what is it they are addressing (test cases?).

 The regression tests whose names contain references to internal bug numbers
 require a bit more analysis as to their usefulness. Of course an explanation
 attached to each would alleviate duplicating your work. I have not
 cross-checked them to JIRA.

 Some of the compiler characterizations changes, as well as the associated
 GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile,
 GNUmakefile.cfg changes. I may have spotted other issues but I would wait
 for your feed-back first.

 The C++ C library headers seem to have been re-written to your port. I am
 unsure why you needed this, but it surely breaks the original intent for
 these headers' structure. I have also noticed that you stripped the Apache
 notice and added an Oracle copyright notice on them.

 This pretty much sums up my first impression.

Hi

Yes, you are correct. :-)

To begin with: the compiler flags/GNUmakefile changes are very
specific to the SunPro compilers and to our internal build system.
These changes are most likely not suitable for inclusion in the
canonical stdcxx, except maybe for the sunpro.config changes, in case
someone would like to be able to replicate our builds. I'd like to
mention that, in Solaris, Apache stdcxx is a system library.

About the Standard C Library forwarding header files: these changes
are specfic to Solaris. The reason behind them is: the Solaris
architectural rules, which can be best summarized as: there can be
only one of each. In other words, it is Verboten, in Solaris, to
duplicate the Standard C Library header files (or any other header
file for that matter). The Solaris Standard C Library header files are
C++-clean - they are required to be so, by the same architectural
rules. Again, these changes are specific to Solaris, and are probably
not portable across other implementations. I know for a fact that they
are not portable for either the GCC or Intel compilers (with which I
test regularly on Linux, in addition to SunPro).

So these two groups of changesets can be ignored.

I opened yesterday STDCXX-1066:

https://issues.apache.org/jira/browse/STDCXX-1056

about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll
have patches done this weekend. Achtung: the patchset is very large
and touches a very large number of files. It's strange that I didn't
get an email about STDCXX-1066.

I'd also like to talk about STDCXX-1056:

https://issues.apache.org/jira/browse/STDCXX-1056

which has already had an initial discussion, and for which I have
attached  a patch. This issue also addresses (indirectly) linkage when
building with GCC. On the recent versions of GCC that I have tested
with, passing -supc++ on link line automatically links with the GNU
libstdc++6.so (on top of linking with stdcxx), and that just bad.

And then I'll have to cross reference the patches which refer to our
internal bug numbers because most of them are quite old and right now,
off the top of my head, I can't remember what they are. :-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX forks

2012-09-01 Thread Liviu Nicoara

On Sep 1, 2012, at 1:52 PM, Stefan Teleman wrote:

 On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote:
 [...]
 This pretty much sums up my first impression.
 
 [...]
 To begin with: the compiler flags/GNUmakefile changes are very
 specific to the SunPro compilers and to our internal build system.
 These changes are most likely not suitable for inclusion in the
 canonical stdcxx, except maybe for the sunpro.config changes, in case
 someone would like to be able to replicate our builds. I'd like to
 mention that, in Solaris, Apache stdcxx is a system library.

Good point. 

 
 About the Standard C Library forwarding header files: these changes
 are specfic to Solaris. The reason behind them is: the Solaris
 architectural rules, which can be best summarized as: there can be
 only one of each. In other words, it is Verboten, in Solaris, to
 duplicate the Standard C Library header files (or any other header
 file for that matter). The Solaris Standard C Library header files are
 C++-clean - they are required to be so, by the same architectural
 rules. Again, these changes are specific to Solaris, and are probably
 not portable across other implementations. I know for a fact that they
 are not portable for either the GCC or Intel compilers (with which I
 test regularly on Linux, in addition to SunPro).
 
 So these two groups of changesets can be ignored.

Got it.

 
 I opened yesterday STDCXX-1066:
 
 https://issues.apache.org/jira/browse/STDCXX-1056
 
 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll
 have patches done this weekend. Achtung: the patchset is very large
 and touches a very large number of files. It's strange that I didn't
 get an email about STDCXX-1066.

Acknowledged. I have seen the individual patches on those two websites.

 
 I'd also like to talk about STDCXX-1056:
 
 https://issues.apache.org/jira/browse/STDCXX-1056
 
 which has already had an initial discussion, and for which I have
 attached  a patch. This issue also addresses (indirectly) linkage when
 building with GCC. On the recent versions of GCC that I have tested
 with, passing -supc++ on link line automatically links with the GNU
 libstdc++6.so (on top of linking with stdcxx), and that just bad.

I briefly looked at it, will delve into it later.

 
 And then I'll have to cross reference the patches which refer to our
 internal bug numbers because most of them are quite old and right now,
 off the top of my head, I can't remember what they are. :-)

Thanks!

Liviu



Re: STDCXX forks

2012-08-31 Thread C. Bergström

On 08/31/12 07:40 PM, Liviu Nicoara wrote:

Please correct me if I am wrong.

I have seen two forks on github, one belonging to C. 
Bergstrom/Pathscale and the other to Stefan Teleman.


The first one contains a number of genuine code changes outside the 
Apache repository, but without canonical and meaningful commit 
comments, and without accompanying test cases.

We can help clean this up if it makes a real difference
Some carry what seem to be bug id's from a private bug database. Some 
of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727
iirc we renamed the files because of some problem with cmake - it was 
trying to compile those files instead of treating them like headers.

, fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd.
I forget why we did this tbh - I'll try to get an answer in case it 
comes up again as a question

Are you guys dropping support for any  platforms?
Not intentionally - We are using/testing this on Solaris, FBSD and Linux 
- We may add OSX/Win to that list soon, but I can't promise at this 
point.  For targets we only are able to test x86 and x86_64


NetBSD also has a fork I believe, but not sure if it's public/status


Stefan's seem like a complete git-ification of the whole Apache 
repository but with no changes I could detect.
He has quite a number of patches and forget where those are kept.  I'm 
guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS 
testsuite.

http://www.peren.com/pages/cppvs_set.htm




Re: STDCXX forks

2012-08-31 Thread Liviu Nicoara

On 08/31/12 08:58, C. Bergström wrote:

On 08/31/12 07:40 PM, Liviu Nicoara wrote:

Please correct me if I am wrong.

I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and 
the other to Stefan Teleman.

The first one contains a number of genuine code changes outside the Apache 
repository, but without canonical and meaningful commit comments, and without 
accompanying test cases.

We can help clean this up if it makes a real difference


That is appreciated. It would make a huge difference to anybody coming anew to 
the project or just catching up, like me.


Some carry what seem to be bug id's from a private bug database. Some of the 
changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727

iirc we renamed the files because of some problem with cmake - it was trying to 
compile those files instead of treating them like headers.

, fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd.

I forget why we did this tbh - I'll try to get an answer in case it comes up 
again as a question


My point exactly, a good comment explains the why's.


Are you guys dropping support for any platforms?

Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We 
may add OSX/Win to that list soon, but I can't promise at this point. For 
targets we only are able to test x86 and x86_64


The reason I am asking is that changes to the template implementation files have most 
likely broken the support for compilers that do implicit inclusion of 
template implementations (a.k.a. link-time instantiation of templates). I am not 
incriminating anyone mind you, I am just trying to point out there mightbe a gap in your 
testing of your changes.



NetBSD also has a fork I believe, but not sure if it's public/status


Do you know where that is?



Stefan's seem like a complete git-ification of the whole Apache repository but 
with no changes I could detect.

He has quite a number of patches and forget where those are kept. I'm guessing 
a lot of his fixes target KDE/Qt apps and the Perennial C++VS testsuite.
http://www.peren.com/pages/cppvs_set.htm



I briefly looked over some patches, there's quite a few of them. I plan to go 
in more detail over the week-end.

Cheers.

L


Re: STDCXX forks

2012-08-31 Thread C. Bergström

On 08/31/12 08:10 PM, Liviu Nicoara wrote:




NetBSD also has a fork I believe, but not sure if it's public/status


Do you know where that is?

He just posted his bulk patch
http://www.netbsd.org/~joerg/stdcxx.diff

There's a small amount of CVS noise and we already have one part on our 
github.  I don't have more information


Re: STDCXX forks

2012-08-31 Thread Stefan Teleman
On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan's seem like a complete git-ification of the whole Apache repository
 but with no changes I could detect.

Not quite. :-)

You are - most likely referring to the svn repo at CVSDude here:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/

Yes I maintain that fork. All the patches are in this directory:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/diffs/

and they apply with the apply_patches.sh script from here:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/apply_patches.sh

The official Oracle port for Solaris 10 and 11, which I maintain, is here:

http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/

This is the source code which is used to build stdcxx on Solaris 10
and 11. The patches for stdcxx on Solaris are here:

http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/patches/

The official Solaris 11 stdcxx package can be installed on Solaris 11 from here:

http://pkg.oracle.com/solaris/release/en/search.shtml?token=stdcxxaction=Search

For Solaris 10, you need to install SUNWlibstdcxx4 - which contains
the stdcxx library and its header files - and SUNWlibstdcxx4S which
contains the source code plus all the patches, and which installs in
/usr/share/src/. It installs on Solaris 10 Update 10 and later.

I maintain the repo at cvsdude on an ad-hoc basis. That means now and then. :-)

The source code repo at Oracle is constantly maintained at Oracle, and
we publish source code drops every two weeks there (I think).

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX forks

2012-08-31 Thread Jim Jagielski

On Aug 31, 2012, at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote:

 
 Stefan's seem like a complete git-ification of the whole Apache repository 
 but with no changes I could detect.
 

FWIW, The ASF supports git so if people think it would help,
all we'd need to do is ask #infra to move stdcxx to git.



Re: STDCXX forks

2012-08-31 Thread Stefan Teleman
On Fri, Aug 31, 2012 at 8:58 AM, C. Bergström
cbergst...@pathscale.com wrote:

 He has quite a number of patches and forget where those are kept.  I'm
 guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS
 testsuite.
 http://www.peren.com/pages/cppvs_set.htm

Correction: my patches aren't related to Qt/KDE at all at this point -
they are related to Solaris. Apache stdcxx is part of Solaris Core.

So the authoritative patchset for Solaris/stdcxx is the one
publisehd by Oracle, since it is kept up-to-date and it represents an
officially released and supported version of stdcxx.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com