Re: multi-core software

2009-06-11 Thread Jon Harrop
Matthias Blume wrote:
 Jeff M. mass...@gmail.com writes:
 But, assuming that your program works and does what it's supposed to,
 I agree with Jon that performance needs to be right near the top of
 the list of concerns. Why? Performance isn't about looking good as a
 programmer, or having fun making a function run in 15 cycles instead
 of 24, or coming up with some neat bit packing scheme so that your app
 now only uses 20K instead of 200K. Performance is - pure and simple -
 about one thing only: money.
 
 Programmer time is vastly more expensive than CPU time, so the
 money argument often leads to slow (low performance) solutions as long
 as they are good enough because developing a faster solution would
 mean spending more valuable programmer time at a cost that cannot
 be recovered over the life cycle of the product in question.

In the context of commercial software, the money to fund developers to
improve performance comes from the huge marketing budget because
performance is usually more about marketing than anything else.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Jon Harrop
Arved Sandstrom wrote:
 Jon Harrop wrote:
 Arved Sandstrom wrote:
 Jon Harrop wrote:
 No. Concurrent programming is about interleaving computations in order
 to reduce latency. Nothing to do with parallelism.

 Jon, I do concurrent programming all the time, as do most of my peers.
 Way down on the list of why we do it is the reduction of latency.
 
 What is higher on the list?
 
 Correctness.
 
 I'm not being facetious. I write applications that run on application
 servers, and from time to time I have had to write various special
 purpose servers. This kind of programming is all about managing
 concurrent execution of computations. The overarching concern is 
 reliability and correct function. For many corporate situations, even
 with hundreds of users, the actual load at any instant is low enough
 that the various servers involved are nowhere close to being stressed
 out - performance is a secondary issue.

In other words, without concurrency the latency would be so high that you
would consider the program to be wrong. However you cut it, the real reason
is latency.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Jeff M.
On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote:
 Jon Harrop wrote:
 
  Arved Sandstrom wrote:
 
  Jon, I do concurrent programming all the time, as do most of my peers.
  Way down on the list of why we do it is the reduction of latency.

  What is higher on the list?

 Correctness.


IMO, that response is a bit of a cop-out. Correctness is _always_ most
important, no matter what application you are creating; without it,
you don't have a job and the company you work for goes out of
business.

But, assuming that your program works and does what it's supposed to,
I agree with Jon that performance needs to be right near the top of
the list of concerns. Why? Performance isn't about looking good as a
programmer, or having fun making a function run in 15 cycles instead
of 24, or coming up with some neat bit packing scheme so that your app
now only uses 20K instead of 200K. Performance is - pure and simple -
about one thing only: money.

Programs that use more memory require more money for the hardware of
every user. Programs that run slower eat more time per day. If you
have 100,000 users, all doing an operation once per day that takes 20
seconds, being able to shave 5 seconds off that saves 5.78 man-days of
work. Hell, for some applications, that 20 seconds is just startup
time spent at a splash screen. Just imagine if every Google search
took even 5 seconds to resolve, how much time would be wasted every
day around the world - ignoring the fact that Google wouldn't exist if
that were the case ;-). Obviously Google engineers work incredibly
hard every day to ensure correct results, but performance better be
right up there at the top of the list as well.

Jeff M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Matthias Blume
Jeff M. mass...@gmail.com writes:

 On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote:
 Jon Harrop wrote:
 
  Arved Sandstrom wrote:
 
  Jon, I do concurrent programming all the time, as do most of my peers.
  Way down on the list of why we do it is the reduction of latency.

  What is higher on the list?

 Correctness.


 IMO, that response is a bit of a cop-out. Correctness is _always_ most
 important, no matter what application you are creating; without it,
 you don't have a job and the company you work for goes out of
 business.

 But, assuming that your program works and does what it's supposed to,
 I agree with Jon that performance needs to be right near the top of
 the list of concerns. Why? Performance isn't about looking good as a
 programmer, or having fun making a function run in 15 cycles instead
 of 24, or coming up with some neat bit packing scheme so that your app
 now only uses 20K instead of 200K. Performance is - pure and simple -
 about one thing only: money.

Programmer time is vastly more expensive than CPU time, so the
money argument often leads to slow (low performance) solutions as long
as they are good enough because developing a faster solution would
mean spending more valuable programmer time at a cost that cannot
be recovered over the life cycle of the product in question.

That being said, there are plenty of situations where performance
obviously does matter a great deal -- as you correctly pointed out.
(It all depends on the above mentioned product in question and the
nature of its life cycle.)

Matthias
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Paul Rubin
Jon Harrop j...@ffconsultancy.com writes:
  I'm not being facetious. I write applications that run on application
  servers, and from time to time I have had to write various special
  purpose servers. This kind of programming is all about managing
  concurrent execution of computations. The overarching concern is 
  reliability and correct function. For many corporate situations, even
  with hundreds of users, the actual load at any instant is low enough
  that the various servers involved are nowhere close to being stressed
  out - performance is a secondary issue.
 
 In other words, without concurrency the latency would be so high
 that you would consider the program to be wrong. However you cut it,
 the real reason is latency.

I don't think that follows, if there is two-way communication and
dependency between the servers, combined with lack of control over
when any particular server decides to initiate an outgoing request.
Stuff may have to happen concurrently to avoid complete deadlock.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Jeff M.
On Jun 10, 12:49 pm, Seamus MacRae smacrae...@live.ca.invalid wrote:
 Jeff M. wrote:
  On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote:
  Jon Harrop wrote:
  Arved Sandstrom wrote:
  Jon, I do concurrent programming all the time, as do most of my peers.
  Way down on the list of why we do it is the reduction of latency.
  What is higher on the list?
  Correctness.

  IMO, that response is a bit of a cop-out. Correctness is _always_ most
  important, no matter what application you are creating; without it,
  you don't have a job and the company you work for goes out of
  business.

 And when, exactly, did Microsoft go out of business? I hadn't heard it
 mentioned in the news. :)

Touche. :)

Jeff M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Seamus MacRae

Jeff M. wrote:

On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote:

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon, I do concurrent programming all the time, as do most of my peers.
Way down on the list of why we do it is the reduction of latency.

What is higher on the list?

Correctness.


IMO, that response is a bit of a cop-out. Correctness is _always_ most
important, no matter what application you are creating; without it,
you don't have a job and the company you work for goes out of
business.


And when, exactly, did Microsoft go out of business? I hadn't heard it 
mentioned in the news. :)

--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Dimiter malkia Stanev

Jeff M. wrote:

On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote:

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon, I do concurrent programming all the time, as do most of my peers.
Way down on the list of why we do it is the reduction of latency.

What is higher on the list?

Correctness.



IMO, that response is a bit of a cop-out. Correctness is _always_ most
important, no matter what application you are creating; without it,
you don't have a job and the company you work for goes out of
business.


PC / Video Games definitely fall out of the correctness. As long as the 
game does not crash your XBOX/PS3/Whatever for certain amount of time, 
and behaves well then, it's fine.


Bugs are already part of the genre.

In reality you can't ship on time, there are always BUGS :)

Most important thing in games is (at least for large percent of them) 
speed of graphics - fluid 60fps, or stable 30fps.




But, assuming that your program works and does what it's supposed to,
I agree with Jon that performance needs to be right near the top of
the list of concerns. Why? Performance isn't about looking good as a
programmer, or having fun making a function run in 15 cycles instead
of 24, or coming up with some neat bit packing scheme so that your app
now only uses 20K instead of 200K. Performance is - pure and simple -
about one thing only: money.

Programs that use more memory require more money for the hardware of
every user. Programs that run slower eat more time per day. If you
have 100,000 users, all doing an operation once per day that takes 20
seconds, being able to shave 5 seconds off that saves 5.78 man-days of
work. Hell, for some applications, that 20 seconds is just startup
time spent at a splash screen. Just imagine if every Google search
took even 5 seconds to resolve, how much time would be wasted every
day around the world - ignoring the fact that Google wouldn't exist if
that were the case ;-). Obviously Google engineers work incredibly
hard every day to ensure correct results, but performance better be
right up there at the top of the list as well.

Jeff M.

--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Arved Sandstrom

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon Harrop wrote:

No. Concurrent programming is about interleaving computations in order
to reduce latency. Nothing to do with parallelism.

Jon, I do concurrent programming all the time, as do most of my peers.
Way down on the list of why we do it is the reduction of latency.

What is higher on the list?

Correctness.

I'm not being facetious. I write applications that run on application
servers, and from time to time I have had to write various special
purpose servers. This kind of programming is all about managing
concurrent execution of computations. The overarching concern is 
reliability and correct function. For many corporate situations, even

with hundreds of users, the actual load at any instant is low enough
that the various servers involved are nowhere close to being stressed
out - performance is a secondary issue.


In other words, without concurrency the latency would be so high that you
would consider the program to be wrong. However you cut it, the real reason
is latency.


For a certain group of applications and user loads I would concede that 
point, yes.


For quite a few other situations, you could queue up user requests and 
execute them in order, finishing each before proceeding to the next, and 
users wouldn't even notice. I wrote a J2SE server a few months ago, to 
solve a very specific problem associated with an application running on 
a J2EE server, that could handle dozens of users per second using this 
strategy. It didn't write it that way because doing so is perverse, but 
I could have.


AHS
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-10 Thread Arved Sandstrom

Jeff M. wrote:

On Jun 9, 9:08 pm, Arved Sandstrom dces...@hotmail.com wrote:

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon, I do concurrent programming all the time, as do most of my peers.
Way down on the list of why we do it is the reduction of latency.

What is higher on the list?

Correctness.



IMO, that response is a bit of a cop-out. Correctness is _always_ most
important, no matter what application you are creating; without it,
you don't have a job and the company you work for goes out of
business.

But, assuming that your program works and does what it's supposed to,
I agree with Jon that performance needs to be right near the top of
the list of concerns. Why? Performance isn't about looking good as a
programmer, or having fun making a function run in 15 cycles instead
of 24, or coming up with some neat bit packing scheme so that your app
now only uses 20K instead of 200K. Performance is - pure and simple -
about one thing only: money.

Programs that use more memory require more money for the hardware of
every user. Programs that run slower eat more time per day. If you
have 100,000 users, all doing an operation once per day that takes 20
seconds, being able to shave 5 seconds off that saves 5.78 man-days of
work. Hell, for some applications, that 20 seconds is just startup
time spent at a splash screen. Just imagine if every Google search
took even 5 seconds to resolve, how much time would be wasted every
day around the world - ignoring the fact that Google wouldn't exist if
that were the case ;-). Obviously Google engineers work incredibly
hard every day to ensure correct results, but performance better be
right up there at the top of the list as well.

Jeff M.


Point taken, but I primarily work on internal government and corporate 
applications. Might be hundreds or thousands of users at any given time, 
but not typically tens or hundreds of thousands. Hundreds or thousands 
of users translates to at least an order of magnitude less 
simultaneous users. Ops people that I talk to who monitor apps I've 
worked on, or similar apps, rarely report any such where the application 
itself is presenting a sluggish aspect to users because it's stressing 
out processors, or consuming gargantuan memory.


Typically when latency problems come up, it's because the app has to 
talk to other servers - databases, LDAP, print servers etc. In other 
words, the network is coming into play. In fact when we do work on 
performance, it's typically about minimizing calls to the database or 
other services.


AHS
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread Piet van Oostrum
 Seamus MacRae smacrae...@live.ca.invalid (SM) wrote:

SM Piet van Oostrum wrote:
 By the way, there is a series of articles about concurrency on ACM Queue
 which may be interesting for those participating in or just following
 this discussion:
 
 http://queue.acm.org/listing.cfm?item_topic=Concurrencyqc_type=theme_listfilter=Concurrencypage_title=Concurrency
 
 Here is one introductory paragraph from one of the articles:
 
 Parallel programming poses many new challenges to the developer, one of
 which is synchronizing concurrent access to shared memory by multiple
 threads. Programmers have traditionally used locks for synchronization,
 but lock-based synchronization has well-known pitfalls. Simplistic
 coarse-grained locking does not scale well, while more sophisticated
 fine-grained locking risks introducing deadlocks and data races.
 Furthermore, scalable libraries written using fine-grained locks cannot
 be easily composed in a way that retains scalability and avoids deadlock
 and data races. 

SM Is that the one about transactional memory?

Yes
-- 
Piet van Oostrum p...@cs.uu.nl
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread toby
On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote:
 Arved Sandstrom wrote:
  Jon Harrop wrote:
  I see no problem with mutable shared state.

  In which case, Jon, you're in a small minority.

 No. Most programmers still care about performance

Frequently when they shouldn't.

 and performance means
 mutable state.

Hm, not sure Erlangers would wholly agree.


 --
 Dr Jon D Harrop, Flying Frog Consultancy Ltd.http://www.ffconsultancy.com/?u

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread Jon Harrop
toby wrote:
 On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote:
 Arved Sandstrom wrote:
  Jon Harrop wrote:
  I see no problem with mutable shared state.

  In which case, Jon, you're in a small minority.

 No. Most programmers still care about performance
 
 Frequently when they shouldn't.

I disagree. A lot of software is still far too slow because the programmers
failed to pay attention to performance. Blogspot in Firefox being one
example: it can barely keep up with my typing!

 and performance means mutable state.
 
 Hm, not sure Erlangers would wholly agree.

Erlang is about concurrency. This thread is about parallelism.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread George Neuner
On Tue, 9 Jun 2009 10:47:11 -0700 (PDT), toby
t...@telegraphics.com.au wrote:

On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote:
 Arved Sandstrom wrote:
  Jon Harrop wrote:

 performance means mutable state.

Hm, not sure Erlangers would wholly agree.

Erlang uses quite a bit of mutable state behind the scenes ... the
programmers just don't see it.

George
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread Dimiter malkia Stanev

Erlang uses quite a bit of mutable state behind the scenes ... the
programmers just don't see it.

George


Heh... The CPUs use quite a bit of mutable state behind the scenes ... 
the programmers just don't see it.


Actually with CPU they see it more, than... say Erlang (that's why you 
need to use fences/barriers/locked access here and there).

--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread Emile van Sebille

On 6/9/2009 11:59 AM Jon Harrop said...

toby wrote:

On Jun 7, 2:41 pm, Jon Harrop j...@ffconsultancy.com wrote:

snip

No. Most programmers still care about performance

Frequently when they shouldn't.


I disagree. A lot of software is still far too slow because the programmers
failed to pay attention to performance. 


For a properly written spec, performance is spec'd and paid for.


Blogspot in Firefox being one
example: it can barely keep up with my typing!


There are so many ways that's not the program.

Performance,accuracy,cost -- pick two sacrifice one.

Emile

--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-09 Thread Arved Sandstrom

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon Harrop wrote:

Arved Sandstrom wrote:

Lew wrote:

Interesting distinction.  Would it be fair to compare concurrent
programming to the bricks used to build the parallel program's edifice?

Way too much of a fine distinction. While they are in fact different,
the point of concurrent programming is to structure programs as a group
of computations, which can be executed in parallel (however that might
actually be done depending on how many processors there are).

No. Concurrent programming is about interleaving computations in order to
reduce latency. Nothing to do with parallelism.

Jon, I do concurrent programming all the time, as do most of my peers.
Way down on the list of why we do it is the reduction of latency.


What is higher on the list?


Correctness.

I'm not being facetious. I write applications that run on application 
servers, and from time to time I have had to write various special 
purpose servers. This kind of programming is all about managing 
concurrent execution of computations. The overarching concern is 
reliability and correct function. For many corporate situations, even 
with hundreds of users, the actual load at any instant is low enough 
that the various servers involved are nowhere close to being stressed 
out - performance is a secondary issue.


AHS
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-08 Thread Piet van Oostrum
By the way, there is a series of articles about concurrency on ACM Queue
which may be interesting for those participating in or just following
this discussion:

http://queue.acm.org/listing.cfm?item_topic=Concurrencyqc_type=theme_listfilter=Concurrencypage_title=Concurrency

Here is one introductory paragraph from one of the articles:

Parallel programming poses many new challenges to the developer, one of
which is synchronizing concurrent access to shared memory by multiple
threads. Programmers have traditionally used locks for synchronization,
but lock-based synchronization has well-known pitfalls. Simplistic
coarse-grained locking does not scale well, while more sophisticated
fine-grained locking risks introducing deadlocks and data races.
Furthermore, scalable libraries written using fine-grained locks cannot
be easily composed in a way that retains scalability and avoids deadlock
and data races. 
-- 
Piet van Oostrum p...@cs.uu.nl
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-08 Thread Paul Wallich

rossb...@mpi-sws.org wrote:

On Jun 8, 6:28 am, Ken T. nowh...@home.com wrote:

Let's not forget Elite for the 6502 exploiting predictable performance
in order to switch graphics modes partway down the vsync!

That actually didn't require predictable timing.  You could tell the
video chip to send you an interrupt when it got to a given scan line.  I
used this myself.  


I don't know what Elite did, but I know for sure that it was a common
trick on the Atari ST to switch color palettes or graphics mode at a
fixed point *in each single scan line* to get more colors, or display
graphics on the screen borders. That required synchronous
programming, i.e. counting clock cycles of machine instructions such
that for every point in the program you knew exactly where the
electron ray would be.

The Atari ST had an M68000 with exactly 8 MHz, which made this
possible. There were no caches in those times, and clock cycles were
entirely predictable.


The usual trick for these machines was an exact multiple of the NTSC 
color clock, which was approx 3.58 MHz. The 8-bit atari video games 
and home computers all used this technique, as did the C-64/128. 
68000-based machines (such as the ST and the Amiga) could not only 
exploit that synchrony, they could also (this was the days before memory 
wall) exploit the fact that a 680x0 typically accessed memory only once 
every 4 clock cycles to do DMA from the same memory when the CPU wasn't 
using it. High display resolutions would lock the processor out of RAM 
except during blanking intervals. (Talk about contention and hot spots.)


Figuring out how to reuse resources most effectively was pretty much the 
same as the register-allocation problem for compilers, and was sometimes 
solved using the same kinds of graph-coloring algorithms...

--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-08 Thread Seamus MacRae

Piet van Oostrum wrote:

By the way, there is a series of articles about concurrency on ACM Queue
which may be interesting for those participating in or just following
this discussion:

http://queue.acm.org/listing.cfm?item_topic=Concurrencyqc_type=theme_listfilter=Concurrencypage_title=Concurrency

Here is one introductory paragraph from one of the articles:

Parallel programming poses many new challenges to the developer, one of
which is synchronizing concurrent access to shared memory by multiple
threads. Programmers have traditionally used locks for synchronization,
but lock-based synchronization has well-known pitfalls. Simplistic
coarse-grained locking does not scale well, while more sophisticated
fine-grained locking risks introducing deadlocks and data races.
Furthermore, scalable libraries written using fine-grained locks cannot
be easily composed in a way that retains scalability and avoids deadlock
and data races. 


Is that the one about transactional memory?
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-08 Thread rossberg
On Jun 8, 6:28 am, Ken T. nowh...@home.com wrote:

  Let's not forget Elite for the 6502 exploiting predictable performance
  in order to switch graphics modes partway down the vsync!

 That actually didn't require predictable timing.  You could tell the
 video chip to send you an interrupt when it got to a given scan line.  I
 used this myself.  

I don't know what Elite did, but I know for sure that it was a common
trick on the Atari ST to switch color palettes or graphics mode at a
fixed point *in each single scan line* to get more colors, or display
graphics on the screen borders. That required synchronous
programming, i.e. counting clock cycles of machine instructions such
that for every point in the program you knew exactly where the
electron ray would be.

The Atari ST had an M68000 with exactly 8 MHz, which made this
possible. There were no caches in those times, and clock cycles were
entirely predictable.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jeff M.
On Jun 7, 1:56 am, Paul Rubin http://phr...@nospam.invalid wrote:
 Jeff M. mass...@gmail.com writes:
Even the lightest weight
user space (green) threads need a few hundred instructions, minimum,
to amortize the cost of context switching
  There's always a context switch. It's just whether or not you are
  switching in/out a virtual stack and registers for the context or the
  hardware stack/registers.

 I don't see the hundreds of instructions in that case.  

 http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadring;...

 shows GHC doing 50 million lightweight thread switches in 8.47
 seconds, passing a token around a thread ring.  Almost all of that is
 probably spent acquiring and releasing the token's lock as the token
 is passed from one thread to another.  That simply doesn't leave time
 for hundreds of instructions per switch.

Who said there has to be? Sample code below (just to get the point
across):

struct context {
   vir_reg pc, sp, bp, ... ;
   object* stack;

   // ...

   context* next;
};

struct vm {
   context* active_context;
};

void switch_context(vm* v)
{
   // maybe GC v-active_context before switching

   v-active_context = v-active_context-next;
}

Also, there isn't hundreds of instructions with multiplexing,
either. It's all done in hardware. Take a look at the disassembly for
any application: one that uses native threads on a platform that
supports preemption. You won't see any instructions anywhere in the
program that perform a context switch. If you did that would be
absolutely horrible. Imagine if the compiler did something like this:

while(1)
{
  // whatever
}

do_context_switch_here();

That would suck. ;-)

That's not to imply that there isn't a cost; there's always a cost.
The example above just goes to show that for green threads, the cost
[of the switch] can be reduced down to a single pointer assignment.

Jeff M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Roedy Green wrote:
 On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
 kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
 said :
Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.
 
 Could what you are saying be summed up by saying, The more threads
 you have the more important it is to keep your threads independent,
 sharing as little data as possible.

I see no problem with mutable shared state.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
George Neuner wrote:
 On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green
 see_webs...@mindprod.com.invalid wrote:
On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
said :
Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.

Could what you are saying be summed up by saying, The more threads
you have the more important it is to keep your threads independent,
sharing as little data as possible.
 
 And therein lies the problem of leveraging many cores.  There is a lot
 of potential parallelism in programs (even in Java :) that is lost
 because it is too fine a grain for threads.

That will always be true so it conveys no useful information to the
practitioner.

 Even the lightest weight 
 user space (green) threads need a few hundred instructions, minimum,
 to amortize the cost of context switching.

Work items in Cilk are much faster than that.

 Add to that the fact that programmers have shown themselves, on
 average, to be remarkably bad at figuring out what _should_ be done in
 parallel - as opposed to what _can_ be done - and you've got a clear
 indicator that threads, as we know them, are not scalable except under
 a limited set of conditions.

Parallelism is inherently not scalable. I see no merit in speculating about
the ramifications of average programmers alleged inabilities.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Lew

Scott David Daniels wrote:

the nub of the problem is not on the benchmarks.  There is something
to be said for the good old daays when you looked up the instruction
timings that you used in a little document for your machine, and could
know the cost of any loop.  We are faster now, but part of the cost of
that speed is that timing is a black art.  


Those good old days never existed.  Those manuals never accounted for things 
that affected timing even then, like memory latency or refresh time.  SRAM 
cache made things worse, since the published timings never mentioned 
cache-miss delays.  Though memory cache might seem a recent innovation, it's 
been around a while.  It would be challenging to find any published timing 
since the commercialization of computers that would actually tell the cost of 
any loop.


Things got worse when chips like the '86 family acquired multiple instructions 
for doing loops, still worse when pre-fetch pipelines became deeper and wider, 
absolutely Dark Art due to multi-level memory caches becoming universal, and 
throw-your-hands-up-and-leave-for-the-corner-bar with multiprocessor NUMA 
systems.  OSes and high-level languages complicate the matter - you never know 
how much time slice you'll get or how your source got compiled or optimized by 
run-time.


So the good old days are a matter of degree and self-deception - it was easier 
to fool ourselves then that we could at least guess timings proportionately if 
not absolutely, but things definitely get more unpredictable over evolution.


--
Lew
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Arved Sandstrom

Jon Harrop wrote:

Roedy Green wrote:

On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
said :

Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.

Could what you are saying be summed up by saying, The more threads
you have the more important it is to keep your threads independent,
sharing as little data as possible.


I see no problem with mutable shared state.


In which case, Jon, you're in a small minority.

AHS
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Arved Sandstrom wrote:
 Jon Harrop wrote:
 I see no problem with mutable shared state.

 In which case, Jon, you're in a small minority.

No. Most programmers still care about performance and performance means
mutable state.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Joshua Cranmer

Jon Harrop wrote:

No. Most programmers still care about performance and performance means
mutable state.


[ Citation needed ].

Most programmers I've met could care less about performance.

--
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Patricia Shanahan

Jon Harrop wrote:

Arved Sandstrom wrote:

Jon Harrop wrote:

I see no problem with mutable shared state.

In which case, Jon, you're in a small minority.


No. Most programmers still care about performance and performance means
mutable state.



I don't see why that would affect whether one thinks there are problems.

In my opinion, shared mutable state has a lot of problems. It is also
sometimes the best design for performance reasons.

Patricia
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Arved Sandstrom wrote:
 Jon Harrop wrote:
 Arved Sandstrom wrote:
 Jon Harrop wrote:
 I see no problem with mutable shared state.

 In which case, Jon, you're in a small minority.
 
 No. Most programmers still care about performance and performance means
 mutable state.
 
 Quite apart from performance and mutable state, I believe we were
 talking about mutable _shared_ state. And this is something that gets a
 _lot_ of people into trouble.

Nonsense. Scientists have been writing parallel programs for decades using
shared state extensively without whining about it. Databases are mutable
shared state but millions of database programmers solve real problems every
day without whining about it.

Use your common sense and you can write efficient parallel programs today
with little difficulty. I do.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Joshua Cranmer wrote:
 Jon Harrop wrote:
 No. Most programmers still care about performance and performance means
 mutable state.
 
 [ Citation needed ].
 
 Most programmers I've met could care less about performance.

Then they have no need for parallelism in the first place.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jeff M.
On Jun 7, 3:19 pm, Arved Sandstrom dces...@hotmail.com wrote:
 Jon Harrop wrote:
  Arved Sandstrom wrote:
  Jon Harrop wrote:
  I see no problem with mutable shared state.
  In which case, Jon, you're in a small minority.

  No. Most programmers still care about performance and performance means
  mutable state.

 Quite apart from performance and mutable state, I believe we were
 talking about mutable _shared_ state. And this is something that gets a
 _lot_ of people into trouble.


Mutable shared state gets _bad_ (err.. perhaps inexperienced would
be a better adjective) programmers - who don't know what they are
doing - in trouble. There are many problem domains that either benefit
greatly from mutable shared states or can't [easily] be done without
them. Unified memory management being an obvious example... there are
many more. Unshared state has its place. Immutable state has its
place. Shared immutable state has its place. Shared mutable place has
its place.

Jeff M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Lew

Jon Harrop wrote:

I see no problem with mutable shared state.

In which case, Jon, you're in a small minority.


Patricia Shanahan wrote:

In my opinion, shared mutable state has a lot of problems. It is also
sometimes the best design for performance reasons.


As Dr. Jon pointed out upthread, one can write decent code with mutable shared 
state.  It is also true that mutable state presents a lot of problems - 
potential problems, ones that can be solved, but not ones that can be solved 
thoughtlessly.  On the flip side, one can write a tremendous amount of 
effective multi-threaded code involving shared mutable state with attention to 
a few rules of thumb, like always synchronize access and don't use different 
monitors to do so.


Unlike some environments (e.g., database management systems), Java's tools to 
manage concurrency are explicit and low level.  The programmer's job is to 
make sure those tools are used correctly to avoid problems.  As long as they 
do that, then there is no special problem with shared mutable state.


There is, however, a cost.  Certain things must happen slower when you share 
mutable state, than when you share immutable state or don't share state. 
Certain things must happen when you share mutable state, regardless of speed, 
because without them your code doesn't work.  For some reason, concurrent 
programming is an area often not well understood by a significant percentage 
of workaday programmers.  When problems do arise, they tend to be 
probabilistic in nature and vary widely with system characteristics like 
attempted load.


So the meeting ground is, yes, concurrent mutable state can present problems 
if not properly managed.  Properly managing such is not necessarily a huge 
burden, but it must be borne.  When done properly, shared mutable state will 
not present problems in production.


--
Lew
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Jeff M. wrote:
 On Jun 7, 3:19 pm, Arved Sandstrom dces...@hotmail.com wrote:
 Jon Harrop wrote:
  Arved Sandstrom wrote:
  Jon Harrop wrote:
  I see no problem with mutable shared state.
  In which case, Jon, you're in a small minority.

  No. Most programmers still care about performance and performance means
  mutable state.

 Quite apart from performance and mutable state, I believe we were
 talking about mutable _shared_ state. And this is something that gets a
 _lot_ of people into trouble.
 
 Mutable shared state gets _bad_ (err.. perhaps inexperienced would
 be a better adjective) programmers - who don't know what they are
 doing - in trouble. There are many problem domains that either benefit
 greatly from mutable shared states or can't [easily] be done without
 them. Unified memory management being an obvious example... there are
 many more. Unshared state has its place. Immutable state has its
 place. Shared immutable state has its place. Shared mutable place has
 its place.

Exactly. I don't believe that shared mutable state is any harder to do
correctly than the next solution and it is almost always the most efficient
solution and the sole purpose of writing parallel programs to leverage
multicores is performance.

A bad developer can screw up anything. I see no reason to think that shared
mutable state is any more fragile than the next thing a bad developer can
screw up.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Lew wrote:
 As Dr. Jon pointed out upthread, one can write decent code with mutable
 shared
 state.  It is also true that mutable state presents a lot of problems -
 potential problems, ones that can be solved, but not ones that can be
 solved
 thoughtlessly.  On the flip side, one can write a tremendous amount of
 effective multi-threaded code involving shared mutable state with
 attention to a few rules of thumb, like always synchronize access and
 don't use different monitors to do so.
 
 Unlike some environments (e.g., database management systems), Java's tools
 to
 manage concurrency are explicit and low level.  The programmer's job is to
 make sure those tools are used correctly to avoid problems.  As long as
 they do that, then there is no special problem with shared mutable state.
 
 There is, however, a cost.  Certain things must happen slower when you
 share mutable state, than when you share immutable state or don't share
 state. Certain things must happen when you share mutable state, regardless
 of speed,
 because without them your code doesn't work.  For some reason, concurrent
 programming is an area often not well understood by a significant
 percentage
 of workaday programmers.  When problems do arise, they tend to be
 probabilistic in nature and vary widely with system characteristics like
 attempted load.
 
 So the meeting ground is, yes, concurrent mutable state can present
 problems
 if not properly managed.  Properly managing such is not necessarily a huge
 burden, but it must be borne.  When done properly, shared mutable state
 will not present problems in production.

I agree entirely but my statements were about parallelism and not
concurrency. Parallel and concurrent programming have wildly different
characteristics and solutions. I don't believe shared mutable state is
overly problematic in the context of parallelism. Indeed, I think it is
usually the best solution in that context.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Lew

Jon Harrop wrote:

I agree entirely but my statements were about parallelism and not
concurrency. Parallel and concurrent programming have wildly different
characteristics and solutions. I don't believe shared mutable state is
overly problematic in the context of parallelism. Indeed, I think it is
usually the best solution in that context.


Interesting distinction.  Would it be fair to compare concurrent programming 
to the bricks used to build the parallel program's edifice?


--
Lew
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Ken T.
On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote:

 So the good old days are a matter of degree and self-deception - it was
 easier to fool ourselves then that we could at least guess timings
 proportionately if not absolutely, but things definitely get more
 unpredictable over evolution.

As I recall I could get exact timings on my 6502 based Commodore 64.  The 
issues you speak of simply weren't issues. 

-- 
Ken T.
http://www.electricsenator.net

  Never underestimate the power of stupid people in large groups.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Arved Sandstrom

Lew wrote:

Jon Harrop wrote:

I agree entirely but my statements were about parallelism and not
concurrency. Parallel and concurrent programming have wildly different
characteristics and solutions. I don't believe shared mutable state is
overly problematic in the context of parallelism. Indeed, I think it is
usually the best solution in that context.


Interesting distinction.  Would it be fair to compare concurrent 
programming to the bricks used to build the parallel program's edifice?


Way too much of a fine distinction. While they are in fact different, 
the point of concurrent programming is to structure programs as a group 
of computations, which can be executed in parallel (however that might 
actually be done depending on how many processors there are). Parallel 
computing means to carry out many computations simultaneously. These are 
interleaved definitions. And they are *not* wildly different.


If you talk about shared mutable state, it is not as easy to use as Dr 
Harrop seems to think it is. Maybe in his experience it has been, but in 
general it's no trivial thing to manage. Lew, you probably summarized it 
best a few posts upstream.


AHS
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Arved Sandstrom wrote:
 Lew wrote:
 Interesting distinction.  Would it be fair to compare concurrent
 programming to the bricks used to build the parallel program's edifice?
 
 Way too much of a fine distinction. While they are in fact different,
 the point of concurrent programming is to structure programs as a group
 of computations, which can be executed in parallel (however that might
 actually be done depending on how many processors there are).

No. Concurrent programming is about interleaving computations in order to
reduce latency. Nothing to do with parallelism.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Lew wrote:
 Jon Harrop wrote:
 I agree entirely but my statements were about parallelism and not
 concurrency. Parallel and concurrent programming have wildly different
 characteristics and solutions. I don't believe shared mutable state is
 overly problematic in the context of parallelism. Indeed, I think it is
 usually the best solution in that context.
 
 Interesting distinction.  Would it be fair to compare concurrent
 programming to the bricks used to build the parallel program's edifice?

Concurrent programming certainly underpins the foundations of almost all
parallel programs. Not least at the level of the OS scheduling the threads
than run the parallel programs. However, that knowledge is probably more
confusing than helpful here.

In essence, concurrent programming is concerned with reducing latency (e.g.
evading blocking) by interleaving computations whereas parallel programming
is concerned with maximizing throughput by performing computations at the
same time.

Historically, concurrency has been of general interest on single core
machines in the context of operating systems and IO and has become more
important recently due to the ubiquity of web programming. Parallelism was
once only important to computational scientists programming shared-memory
supercomputers and enterprise developers programming distributed-memory
clusters but the advent of multicore machines on the desktop and in the
games console has pushed parallelism into the lime light for ordinary
developers when performance is important.

Solutions for concurrent and parallel programming are also wildly different.
Concurrent programming typically schedules work items that are expected to
block on a shared queue for a pool of dozens or even hundreds of threads.
Parallel programming typically schedules work items that are expected to
not block on wait-free work-stealing queues for a pool of n threads
where n is the number of cores. Solutions for concurrent programming
(such as the .NET thread pool and asynchronous workflows in F#) can be used
as a poor man's solution for parallel programming but the overheads are
huge because they were not designed for this purpose so performance is much
worse than necessary. Solutions for parallel programming (e.g. Cilk, the
Task Parallel Library) are virtually useless for concurrent programming
because you quickly end up with all n threads blocked and the whole
program stalls.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Patricia Shanahan

Jon Harrop wrote:
...

Historically, concurrency has been of general interest on single core
machines in the context of operating systems and IO and has become more
important recently due to the ubiquity of web programming. Parallelism was
once only important to computational scientists programming shared-memory
supercomputers and enterprise developers programming distributed-memory
clusters but the advent of multicore machines on the desktop and in the
games console has pushed parallelism into the lime light for ordinary
developers when performance is important.

...

Parallelism has also been important, for a long time, to multiprocessor
operating system developers. I got my first exposure to parallel
programming, in the 1980's, working on NCR's VRX operating system.

Patricia
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Dave Angel

Lew wrote:
div class=moz-text-flowed style=font-family: -moz-fixedScott 
David Daniels wrote:

the nub of the problem is not on the benchmarks.  There is something
to be said for the good old daays when you looked up the instruction
timings that you used in a little document for your machine, and could
know the cost of any loop.  We are faster now, but part of the cost of
that speed is that timing is a black art.  


Those good old days never existed.  Those manuals never accounted for 
things that affected timing even then, like memory latency or refresh 
time.  SRAM cache made things worse, since the published timings never 
mentioned cache-miss delays.  Though memory cache might seem a recent 
innovation, it's been around a while.  It would be challenging to find 
any published timing since the commercialization of computers that 
would actually tell the cost of any loop.


Things got worse when chips like the '86 family acquired multiple 
instructions for doing loops, still worse when pre-fetch pipelines 
became deeper and wider, absolutely Dark Art due to multi-level memory 
caches becoming universal, and 
throw-your-hands-up-and-leave-for-the-corner-bar with multiprocessor 
NUMA systems.  OSes and high-level languages complicate the matter - 
you never know how much time slice you'll get or how your source got 
compiled or optimized by run-time.


So the good old days are a matter of degree and self-deception - it 
was easier to fool ourselves then that we could at least guess timings 
proportionately if not absolutely, but things definitely get more 
unpredictable over evolution.


Nonsense.  The 6502 with static memory was precisely predictable, and 
many programmers (working in machine language, naturally) counted on 
it.  Similarly the Novix 4000, when programmed in its native Forth.


And previous to that, I worked on several machines (in fact, I wrote the 
assembler and debugger for two of them) where the only variable was the 
delay every two milliseconds for dynamic memory refresh.  Separate 
control memory and data memory, and every instruction precisely 
clocked.  No instruction prefetch, no cache memory.  What you see is 
what you get.


Would I want to go back there?  No.  Sub-megaherz clocks with much less 
happening on each clock means we were operating at way under .01% of 
present day.


--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Ken T. wrote:
 On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote:
 So the good old days are a matter of degree and self-deception - it was
 easier to fool ourselves then that we could at least guess timings
 proportionately if not absolutely, but things definitely get more
 unpredictable over evolution.
 
 As I recall I could get exact timings on my 6502 based Commodore 64.  The
 issues you speak of simply weren't issues.

Let's not forget Elite for the 6502 exploiting predictable performance in
order to switch graphics modes partway down the vsync!

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Arved Sandstrom

Jon Harrop wrote:

Arved Sandstrom wrote:

Lew wrote:

Interesting distinction.  Would it be fair to compare concurrent
programming to the bricks used to build the parallel program's edifice?

Way too much of a fine distinction. While they are in fact different,
the point of concurrent programming is to structure programs as a group
of computations, which can be executed in parallel (however that might
actually be done depending on how many processors there are).


No. Concurrent programming is about interleaving computations in order to
reduce latency. Nothing to do with parallelism.


Jon, I do concurrent programming all the time, as do most of my peers. 
Way down on the list of why we do it is the reduction of latency.


AHS
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Arved Sandstrom wrote:
 Jon Harrop wrote:
 Arved Sandstrom wrote:
 Lew wrote:
 Interesting distinction.  Would it be fair to compare concurrent
 programming to the bricks used to build the parallel program's edifice?
 Way too much of a fine distinction. While they are in fact different,
 the point of concurrent programming is to structure programs as a group
 of computations, which can be executed in parallel (however that might
 actually be done depending on how many processors there are).
 
 No. Concurrent programming is about interleaving computations in order to
 reduce latency. Nothing to do with parallelism.
 
 Jon, I do concurrent programming all the time, as do most of my peers.
 Way down on the list of why we do it is the reduction of latency.

What is higher on the list?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Ken T.
On Mon, 08 Jun 2009 02:39:40 +0100, Jon Harrop wrote:

 Ken T. wrote:
 On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote:
 So the good old days are a matter of degree and self-deception - it
 was easier to fool ourselves then that we could at least guess timings
 proportionately if not absolutely, but things definitely get more
 unpredictable over evolution.
 
 As I recall I could get exact timings on my 6502 based Commodore 64. 
 The issues you speak of simply weren't issues.
 
 Let's not forget Elite for the 6502 exploiting predictable performance
 in order to switch graphics modes partway down the vsync!

That actually didn't require predictable timing.  You could tell the 
video chip to send you an interrupt when it got to a given scan line.  I 
used this myself.  

Elite was cool though.  I wasted many hours playing that game. 

-- 
Ken T.
http://www.electricsenator.net

  Duct tape is like the force.  It has a light side, and a dark side,
  and it holds the universe together ...
-- Carl Zwanzig
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Paul Rubin
George Neuner gneun...@comcast.net writes:
 Even the lightest weight
 user space (green) threads need a few hundred instructions, minimum,
 to amortize the cost of context switching.

I thought the definition of green threads was that multiplexing them
doesn't require context switches.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Paul Rubin
Jeff M. mass...@gmail.com writes:
   Even the lightest weight
   user space (green) threads need a few hundred instructions, minimum,
   to amortize the cost of context switching
 There's always a context switch. It's just whether or not you are
 switching in/out a virtual stack and registers for the context or the
 hardware stack/registers.

I don't see the hundreds of instructions in that case.  

http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadringlang=ghcid=3

shows GHC doing 50 million lightweight thread switches in 8.47
seconds, passing a token around a thread ring.  Almost all of that is
probably spent acquiring and releasing the token's lock as the token
is passed from one thread to another.  That simply doesn't leave time
for hundreds of instructions per switch.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Paul Rubin wrote:
 Jeff M. mass...@gmail.com writes:
   Even the lightest weight
   user space (green) threads need a few hundred instructions,
   minimum, to amortize the cost of context switching
 There's always a context switch. It's just whether or not you are
 switching in/out a virtual stack and registers for the context or the
 hardware stack/registers.
 
 I don't see the hundreds of instructions in that case.
 

http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadringlang=ghcid=3
 
 shows GHC doing 50 million lightweight thread switches in 8.47
 seconds, passing a token around a thread ring.  Almost all of that is
 probably spent acquiring and releasing the token's lock as the token
 is passed from one thread to another.  That simply doesn't leave time
 for hundreds of instructions per switch.

And Haskell is not exactly fast...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Jon Harrop
Scott David Daniels wrote:
 Lew wrote:
 Scott David Daniels wrote:
 the nub of the problem is not on the benchmarks.  There is something
 to be said for the good old daays when you looked up the instruction
 timings that you used in a little document for your machine, and could
 know the cost of any loop.  We are faster now, but part of the cost of
 that speed is that timing is a black art.
 
 Those good old days never existed.  Those manuals never accounted for
 things that affected timing even then, like memory latency or refresh
 time.
 
 Well, as Gilbert and Sullivan wrote:
   - What, never?
   - No, never!
   - What, Never?
   - Well, hardly ever.
 Look up the LGP-30.  It was quite predictable.  It has been a while.

Same for early ARMs.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Scott David Daniels

Lew wrote:

Scott David Daniels wrote:

the nub of the problem is not on the benchmarks.  There is something
to be said for the good old daays when you looked up the instruction
timings that you used in a little document for your machine, and could
know the cost of any loop.  We are faster now, but part of the cost of
that speed is that timing is a black art.  


Those good old days never existed.  Those manuals never accounted for 
things that affected timing even then, like memory latency or refresh 
time.


Well, as Gilbert and Sullivan wrote:
 - What, never?
 - No, never!
 - What, Never?
 - Well, hardly ever.
Look up the LGP-30.  It was quite predictable.  It has been a while.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Roedy Green
On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
said :

Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.

Could what you are saying be summed up by saying, The more threads
you have the more important it is to keep your threads independent,
sharing as little data as possible.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-07 Thread Raymond Wiker
Roedy Green see_webs...@mindprod.com.invalid writes:

 On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
 kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
 said :

Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.

 Could what you are saying be summed up by saying, The more threads
 you have the more important it is to keep your threads independent,
 sharing as little data as possible.

Absolutely... not a new observation, either, as it follows
directly from Amdahl's law. 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-06 Thread George Neuner
On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green
see_webs...@mindprod.com.invalid wrote:

On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
said :

Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.

Could what you are saying be summed up by saying, The more threads
you have the more important it is to keep your threads independent,
sharing as little data as possible.

And therein lies the problem of leveraging many cores.  There is a lot
of potential parallelism in programs (even in Java :) that is lost
because it is too fine a grain for threads.  Even the lightest weight
user space (green) threads need a few hundred instructions, minimum,
to amortize the cost of context switching.

Add to that the fact that programmers have shown themselves, on
average, to be remarkably bad at figuring out what _should_ be done in
parallel - as opposed to what _can_ be done - and you've got a clear
indicator that threads, as we know them, are not scalable except under
a limited set of conditions. 

George
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-06 Thread John Thingstad
På Sat, 06 Jun 2009 21:46:51 +0200, skrev George Neuner  
gneun...@comcast.net:



On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green

Add to that the fact that programmers have shown themselves, on
average, to be remarkably bad at figuring out what _should_ be done in
parallel - as opposed to what _can_ be done - and you've got a clear
indicator that threads, as we know them, are not scalable except under
a limited set of conditions.

George


I find the dataflow model of concurrency on Oz to be interesting and to  
address many of the issues you just mentioned.
See in particular: 'Dataflow variables and declarative concurrency' and  
onward.

http://en.wikipedia.org/wiki/Oz_(programming_language)


-
John Thingstad
--
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-06 Thread Jeff M.
On Jun 6, 9:58 pm, Paul Rubin http://phr...@nospam.invalid wrote:
 George Neuner gneun...@comcast.net writes:
  Even the lightest weight
  user space (green) threads need a few hundred instructions, minimum,
  to amortize the cost of context switching.

 I thought the definition of green threads was that multiplexing them
 doesn't require context switches.

There's always a context switch. It's just whether or not you are
switching in/out a virtual stack and registers for the context or the
hardware stack/registers.

Jeff M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-05 Thread Roedy Green
On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com
wrote, quoted or indirectly quoted someone who said :

• Why Must Software Be Rewritten For Multi-Core Processors?

Threads have been part of Java since Day 1.  Using threads complicates
your code, but even with a single core processor, they can improve
performance, particularly if you are doing something like combing
multiple websites.

The nice thing about Java is whether you are on a single core
processor or a 256 CPU machine (We got to run our Athena Integer Java
spreadsheet engine on such a beast), does not concern your code.

You just have to make sure your threads don't interfere with each
other, and Java/the OS, handle exploiting all the CPUs available.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-05 Thread Kaz Kylheku
On 2009-06-05, Vend ven...@virgilio.it wrote:
 On Jun 4, 8:35 pm, Roedy Green see_webs...@mindprod.com.invalid
 wrote:
 On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com
 wrote, quoted or indirectly quoted someone who said :

 • Why Must Software Be Rewritten For Multi-Core Processors?

 Threads have been part of Java since Day 1.  Using threads complicates
 your code, but even with a single core processor, they can improve
 performance, particularly if you are doing something like combing
 multiple websites.

 The nice thing about Java is whether you are on a single core
 processor or a 256 CPU machine (We got to run our Athena Integer Java
 spreadsheet engine on such a beast), does not concern your code.

 You just have to make sure your threads don't interfere with each
 other, and Java/the OS, handle exploiting all the CPUs available.

 You need to decompose your problem in 256 independent tasks.

 It can be trivial for some problems and difficult or perhaps
 impossible for some others.

Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-05 Thread Scott Burson
On Jun 4, 9:46 am, Xah Lee xah...@gmail.com wrote:
 Of interest:

 • Why Must Software Be Rewritten For Multi-Core Processors?
  http://xahlee.org/UnixResource_dir/writ/multi-core_software.html

 plain text version follows.

 --
 Why Must Software Be Rewritten For Multi-Core Processors?

 Xah Lee, 2009-06-04

 I had a revelation today, namely, that it is necessary to rewrite
 software to use multi-processor in order to benefit from it.

 This may sound stupid, but is a revelation to me. For the past decade,
 the question has been on my mind, about why should software needs to
 be rewritten to take advantage of multi-processors. Because, in my
 mind, i thought that software are at some fundamental level just
 algorithms, and algorithms, have nothing to do with hardware
 implementation aspects such as number of processors. I always felt,
 that those talks about the need or difficulty of rewriting software
 for multi-processor (or multi-core these days) must be the product of
 idiocy of industrial imperative coding monkies. In particular, some
 languages such as java, the way they deal with it, seems to me
 extremely stupid. e.g. the concept of threads. In my mind, there
 should be a layer between the software and the hardware, such as the
 operating system, or the compiler, that should be able to
 automatically handle or compile the software so that it FULLY use the
 multi-processors when present. In short, i thought that a algorithm
 really has nothing to do with hardware details.

 I never really thought hard about this issue, but today, since i got a
 quad-core PC, so i looked into the issue, and thought about it, and i
 realized the answer. The gist is that, algorithm, fundamentally means
 manipulating some hardware, in fact, algorithm is a step by step
 instruction about some form of hardware, even the hardware may be
 abstract or virtual. For example, let's say the simplest case of 1+1.
 It is a algorithm, but where is the hardware? You may say it's
 abstract concept, or it being a mathematical model. What you call 1+1
 depends on the context, but in our context, those numbers are the
 hardware. To see this, lets say more complex example of listing primes
 by sieve. Again, it comes down to “what is a number”? Here, numbers
 can be stones, or arrangement of beads on abacus, it's hardware! As
 another example, say sorting. To begin with, you have to have some
 something to sort, that's hardware.

 Another way to see this is that, for a given computing problem, there
 are infinite number of algorithms to achieve the same thing. Some,
 will be better ones, requiring less steps, or requiring less storage.
 All these are concrete manipulation issues, and the thing being
 manipulated, ultimately we have to call it hardware. So, when hardware
 changes, say from one cpu to multi-cpu, there's no way for the
 algorithm to magically change and adopt the changed hardware. If you
 need a algorithm that is catered to the new hardware, you need a new
 algorithm.

 One might think that there might be algorithm Omega, such that it
 takes input of old hardware O, new hardware N, and a algorithm A, and
 output a algorithm B, such that B has the same behavior as A, but N+B
 performs better than O+A. This is like asking for Strong AI.

 One thing we often forgot is that algorithms ultimately translates to
 manipulating hardware. In a modern digital computer, that means
 software algorithms ultimately becomes machine instructions in CPU,
 which manipulate the 1s and 0s in register, or electricity voltage in
 transisters.

 In a more mundane point of view, a automatic system for software to
 work on multi-processors is a problem of breaking a problem into
 discrete units (so that they can be computed in parallel). The problem
 of finding a algorithm is entirely different from the problem of
 breaking a problem into distinct units. The problem of breaking a
 problem into distinct units is a entire new branch of mathematics. For
 example, let's say factoring. Factoring is a well defined mathematical
 problem. There are millions algorithms to do it, each class has
 different properties such as number of steps or storage units.
 However, if we look at these algorithms from the point of view of
 distinct units, it's a new perspective on classification of
 algorithms. Software are in general too ill-defined and fuzzy and
 complex, the software we use on personal computers such as email,
 browsers, games, don't even have mathematical models. They don't even
 have mathematical models of their inputs and outputs. To talk about
 automatic system of unitizing software, would be more like a AI
 fantasy. Roughly, a term that describes this aspect of research is
 Parallel computing.

 In the Wikipedia article, it talks about types of parallelism: Bit-
 level parallelism, Instruction-level parallelism, Data parallelism,
 Task parallelism. Then it also discusses hardware aspects classes:
 

Re: multi-core software

2009-06-05 Thread Vend
On Jun 6, 1:26 am, Roedy Green see_webs...@mindprod.com.invalid
wrote:
 On Fri, 5 Jun 2009 18:15:00 + (UTC), Kaz Kylheku
 kkylh...@gmail.com wrote, quoted or indirectly quoted someone who
 said :

 Even for problems where it appears trivial, there can be hidden
 issues, like false cache coherency communication where no actual
 sharing is taking place. Or locks that appear to have low contention and
 negligible performance impact on ``only'' 8 processors suddenly turn into
 bottlenecks. Then there is NUMA. A given address in memory may be
 RAM attached to the processor accessing it, or to another processor,
 with very different access costs.

 Could what you are saying be summed up by saying, The more threads
 you have the more important it is to keep your threads independent,
 sharing as little data as possible.

Besides technical issues such as cache conflicts and synchronization
latencies, there are more theoretical issues of task decomposability.
It seems it is not always feasible to decompose an algorithm into
subprograms that can be executed in parallel.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-04 Thread Kaz Kylheku
[Followup-To: header set to comp.lang.lisp.]
On 2009-06-04, Roedy Green see_webs...@mindprod.com.invalid wrote:
 On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com
 wrote, quoted or indirectly quoted someone who said :

• Why Must Software Be Rewritten For Multi-Core Processors?

 Threads have been part of Java since Day 1.

Unfortunately, not sane threads designed by people who actually understand
multithreading.

 The nice thing about Java is whether you are on a single core
 processor or a 256 CPU machine (We got to run our Athena Integer Java
 spreadsheet engine on such a beast), does not concern your code.

You are dreaming if you think that there are any circumstances (other than
circumstances in which performance doesn't matter) in which you don't have to
concern yourself about the difference between a uniprocessor and a 256 CPU
machine.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi-core software

2009-06-04 Thread MRAB

Kaz Kylheku wrote:

[Followup-To: header set to comp.lang.lisp.]
On 2009-06-04, Roedy Green see_webs...@mindprod.com.invalid wrote:

On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee xah...@gmail.com
wrote, quoted or indirectly quoted someone who said :


• Why Must Software Be Rewritten For Multi-Core Processors?

Threads have been part of Java since Day 1.


Unfortunately, not sane threads designed by people who actually understand
multithreading.


The nice thing about Java is whether you are on a single core
processor or a 256 CPU machine (We got to run our Athena Integer Java
spreadsheet engine on such a beast), does not concern your code.


You are dreaming if you think that there are any circumstances (other than
circumstances in which performance doesn't matter) in which you don't have to
concern yourself about the difference between a uniprocessor and a 256 CPU
machine.


If you're interested in parallel programming, have a look at Flow-Based
Programming:

http://www.jpaulmorrison.com/fbp/

--
http://mail.python.org/mailman/listinfo/python-list