Re: [GTALUG] How to go fast without speculating... maybe

Alvin Starr via talk Mon, 29 Jan 2018 18:18:07 -0800

A number of groups have tried to develop extremely parallel processorsbut all seem to have gained little traction.

There was the XPU 128, the Epiphany(http://www.adapteva.com/) and morerecently the Xenon Phi and AMD Epyc.

At one point I remember reading a article about sun developing anasynchronous CPU which would be interesting.


All these processors run into the same set of problems.
    1) x86 silicon is amazingly cheap.

2) supporting multiple CPUs cause more software support for eachnew CPU architecture. 3) very little software is capable of truly taking advantage ofmany parallel threads without really funky compilers and software designtools. 4) having designed a fancy CPU most companies try very hard to keeptheir proprietary knowledge all within their own control where the x86instruction set must be just about open source now days. 5) getting motherboard manufacturers to take a chance on a new CPUis not an easy thing.

My benchmark for processor success is: Does several ofAsus,Supermicro,Tyan,Gigabyte et al make a motherboard for this CPU.

Even people with deep pockets like DEC with their Alpha CPU and IBM withtheir Power CPUs have not been able to make a significant inroad intothe commodity server world.Mips has had some luck with low to mid range systems for routers andstorage systems but their server business is long gone with the death ofSGI.Sun/Oracle has had some luck with the Sparc but not all that muchoutside their own use and I am just speculating but I would bet thatSun/Oracle sells more x86 systems than Sparc systems.

ARM seems to be having some luck but I believe that luck is because oftheir popularity in the small computer systems world sliding intosupporting larger systems and not by being designed for servers from theget go.

I am a bit of a processor geek and have put lots of effort in the pastinto elegant processors that just seem to go nowhere.I would love to see some technologies other than the current von Neumannsomewhat parallel SMP but I have a sad feeling that that will be a longtime coming.

With the latest screw-up from Intel and the huge exploit surface that isthe Intel ME someone may be able to get some traction by coming up witha processor that is designed and verified for security.



On 01/29/2018 05:36 PM, David Collier-Brown via talk wrote:

Kunle Olukotun didn't like systems that wasted their time stalled onloads and branches. He and his team at Afara Websystems therefordesigned a non-speculating processor that did work without waits. Itbecame the Sun T1.
  Speed without speculating
The basic idea is to have more decoders than ALUs, so you can havelots of threads competing for an ALU. If, for example, thread 0 comesto a load, it will stall, so on the next instruction thread 1 gets theALU, and runs... until it stalls and thread 2 get the ALU. Ditto forthread 3, and control goes back to thread 0, which has completed amulti-cycle fetch from cache and is ready to proceed once more.
That is the basic idea of the Sun T-series processors.
The strength is that the ALUs are never waiting for work. The weaknessis that individual threads still have to wait for data to come from cache.
  You can improve on that
Now imagine it isn't entire ALUs that are the available resources, itsindividual ALU component, like adders. Now the scenario becomes
  * thread 0 stalls
  * thread 1 get an adder
  * thread 2 gets a compare (really a subtracter)
  * thread 3 gets a branch unit, and will probably need to wait in the
    next cycle
  * thread 4 gets an adder
  * thread 5 gets an FPU
... and so on. Each cycle, the hardware assigns as many ALU componentsas it has available to threads, all of which can run. Only the stalledthreads are waiting, and they don't need ALU bits to do that.
Now more threads can run at the same time, the ALU components are(probabilistically) all busy, and we have increased capacity. Butindividual threads are still waiting for cache...
  Do I feel lucky?
In principle, we could allocate two adders to thread 5, one doing thecurrent instruction and another doing a subsequent, non-dependentinstruction. It's not speculative, but it is out-of-order. That makessome threads twice as fast when doing non-interacting calculations.Allocate it three adders and it's three times as fast.
If we're prepared to have more ALU components than decoders, decodedeeply and we have enough of each to be likely to be able to find lotsof non-dependent instructions, then we can be executing multipleinstructions at once in multiple streams, and probabilistically get/startlingly/ better performance.
I can see a new kind of optimizing compiler, too: one which tries togroup non-dependent instructions together.
  Conclusion
Is this what happens in a T5? That's a question for a hardwaredeveloper: I have no idea... yet
Links:

https://en.wikipedia.org/wiki/Kunle_Olukotun

https://en.wikipedia.org/wiki/Afara_Websystems

https://web.archive.org/web/20110720050850/http://www-hydra.stanford.edu/~kunle/

--
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
[email protected]            |                      -- Mark Twain


---
Talk Mailing List
[email protected]
https://gtalug.org/mailman/listinfo/talk


--
Alvin Starr                   ||   land:  (905)513-7688
Netvel Inc.                   ||   Cell:  (416)806-0133
[email protected]              ||

---
Talk Mailing List
[email protected]
https://gtalug.org/mailman/listinfo/talk

Re: [GTALUG] How to go fast without speculating... maybe

Reply via email to