Re: Possible multi-threading bug

Patricia Shanahan Tue, 30 Nov 2010 07:15:22 -0800

Real world integrator with the highest performance expectations isexactly the sort of person I think we should be talking with aboutgetting real world data on River bottlenecks.


Any input? Possibilities for e.g. running with performance-specific logging?


Patricia


Mike McGrady wrote:

I just want to add that as a "real world" integrator with the highest
performance expectations, I rely more on the architecture and design
of systems to solve scalability problems.  A poor design can be made
to scale but there are two scaling ceilings: not only performance but
also cost.

Sent from my iPhone

Michael McGrady Principal investigator AF081_028 SBIR Chief Architect
 Topia Technology, Inc Work 1.253.572.9712 Cel 1.253.720.3365

On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <p...@acm.org> wrote:

On 11/30/2010 1:43 AM, Peter Firmstone wrote:

Patricia Shanahan wrote:

Tom Hobbs wrote:

Yes, you're right.

I knew about the non-atomicity of ++, my concern was a call
to reset creeping in between the two parts of that operation.

That is a very good point.

Leaving reset unsynchronized, even with volatile, would lead to
 results that would not be possible with full synchronization.
Suppose thread A is going to do a reset, and thread B is going
to do an increment, and everybody agrees the count is currently
10.

....

Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
In essence I agree, unfortunately we don't know when we need theperformance, because we can't test scalability. I've only got 4
threads!

I've only got 8 threads on my largest system.

I am very, very strongly opposed to attempting performance tuning
without measurement. I've seen it tried many times over several
decades, and it is always a disaster. I've compared e.g. estimates
of where bottlenecks will be in an operating system prepared by the
OS developers to actual measurements, and I've yet to see the
developers get it right. That includes my own efforts at guessing
bottlenecks, before I learned that it is futile.

Without measurement, you get either get focused effort in entirely
the wrong places or diffuse effort spread over the whole system.
What is really needed is focused effort on a few real bottlenecks
that will go way beyond any set of rules that could reasonably be
applied throughout the system.

I believe the solution is to establish a working relationship with
users of River who have larger systems. They have a vested interest
in helping us make it more scalable.

Here are some simple tips I've found useful:

Have you measured the effectiveness of these tips on real world
scalability? What sort of gain do you see?

There may also be opportunities in the area of data structure and
algorithm scalability. For example, TaskManager uses an ArrayList
to represent something that is basically a FIFO with some
reordering. That is a good idea if, and only if, the queue lengths
are always very short even on large systems. However, I have no
idea whether TaskManager queue lengths tend to increase on large
systems or not.

1. If all mutators are atomic and don't depend on previous state,
a volatile reference or field may be sufficient, but now we haveconcurrency utilities, why not use an atomic reference or fieldinstead? Then if we find we later need a method based on previous
 state, it's easily proven correct.

Do we have the concurrency utilities? The java.util.concurrent
packages are all "Since 1.5", and some of their classes are "Since
1.6". We can only use them if we are abandoning any chance of River
running with a 1.4 rt.jar.

To my mind, the advances in java.util and its sub-packages are a
really strong motivation for getting to 1.5 or, better, 1.6 ASAP.

Currently, a quick grep indicates java.util.concurrent is only used
in ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which
is part of the QA infrastructure, not the production system.

...

Have you got any examples of a formal proof of correctness? Just
out of curiosity?

Unfortunately, the proofs of correctness I've written that related
to concurrency were based on confidential data and written on the
job when I was working as a performance architect for Cray Research
and Sun Microsystems.

I do have some coursework proofs from the UCSD Design of Algorithms
graduate course. I'll dig out an example to send to you directly.

Proof of correctness of large systems is a research topic. I'm
talking about looking very narrowly at a class that has been shown
to be performance-critical and is being subjected to special
performance tuning that uses risky techniques such as volatile.

Patricia

Re: Possible multi-threading bug

Reply via email to