Re: Possible multi-threading bug

Mike McGrady Tue, 30 Nov 2010 04:22:34 -0800

I just want to add that as a "real world" integrator with the highest 
performance expectations, I rely more on the architecture and design of systems 
to solve scalability problems.  A poor design can be made to scale but there 
are two scaling ceilings: not only performance but also cost.


Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <p...@acm.org> wrote:

> On 11/30/2010 1:43 AM, Peter Firmstone wrote:
>> Patricia Shanahan wrote:
>>> Tom Hobbs wrote:
>>>> Yes, you're right.
>>>> 
>>>> I knew about the non-atomicity of ++, my concern was a call to reset
>>>> creeping in between the two parts of that operation.
>>> 
>>> That is a very good point.
>>> 
>>> Leaving reset unsynchronized, even with volatile, would lead to
>>> results that would not be possible with full synchronization. Suppose
>>> thread A is going to do a reset, and thread B is going to do an
>>> increment, and everybody agrees the count is currently 10.
> ....
>> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>> 
>> In essence I agree, unfortunately we don't know when we need the
>> performance, because we can't test scalability. I've only got 4 threads!
> 
> I've only got 8 threads on my largest system.
> 
> I am very, very strongly opposed to attempting performance tuning without 
> measurement. I've seen it tried many times over several decades, and it is 
> always a disaster. I've compared e.g. estimates of where bottlenecks will be 
> in an operating system prepared by the OS developers to actual measurements, 
> and I've yet to see the developers get it right. That includes my own efforts 
> at guessing bottlenecks, before I learned that it is futile.
> 
> Without measurement, you get either get focused effort in entirely the wrong 
> places or diffuse effort spread over the whole system. What is really needed 
> is focused effort on a few real bottlenecks that will go way beyond any set 
> of rules that could reasonably be applied throughout the system.
> 
> I believe the solution is to establish a working relationship with users of 
> River who have larger systems. They have a vested interest in helping us make 
> it more scalable.
> 
>> Here are some simple tips I've found useful:
> 
> Have you measured the effectiveness of these tips on real world scalability? 
> What sort of gain do you see?
> 
> There may also be opportunities in the area of data structure and algorithm 
> scalability. For example, TaskManager uses an ArrayList to represent 
> something that is basically a FIFO with some reordering. That is a good idea 
> if, and only if, the queue lengths are always very short even on large 
> systems. However, I have no idea whether TaskManager queue lengths tend to 
> increase on large systems or not.
> 
>>  1. If all mutators are atomic and don't depend on previous state, a
>>     volatile reference or field may be sufficient, but now we have
>>     concurrency utilities, why not use an atomic reference or field
>>     instead? Then if we find we later need a method based on previous
>>     state, it's easily proven correct.
> 
> Do we have the concurrency utilities? The java.util.concurrent packages are 
> all "Since 1.5", and some of their classes are "Since 1.6". We can only use 
> them if we are abandoning any chance of River running with a 1.4 rt.jar.
> 
> To my mind, the advances in java.util and its sub-packages are a really 
> strong motivation for getting to 1.5 or, better, 1.6 ASAP.
> 
> Currently, a quick grep indicates java.util.concurrent is only used in 
> ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which is part of the 
> QA infrastructure, not the production system.
> 
> ...
> 
>> Have you got any examples of a formal proof of correctness? Just out of
>> curiosity?
> 
> Unfortunately, the proofs of correctness I've written that related to 
> concurrency were based on confidential data and written on the job when I was 
> working as a performance architect for Cray Research and Sun Microsystems.
> 
> I do have some coursework proofs from the UCSD Design of Algorithms graduate 
> course. I'll dig out an example to send to you directly.
> 
> Proof of correctness of large systems is a research topic. I'm talking about 
> looking very narrowly at a class that has been shown to be 
> performance-critical and is being subjected to special performance tuning 
> that uses risky techniques such as volatile.
> 
> Patricia
>

Re: Possible multi-threading bug

Reply via email to