At least on Sparc processors, v8 and newer, any double precision math (including longs) is performed with a single instruction, just like for a 32 bit datum. Loads and stores of 8 byte datums are also handled via a single instruction. The urban myth that 64bit math is different/better on a 64 bit processor is just that; yes, some lower end processors would emulate/trap those instructions but that an implementation detail, not architecture. I believe that this is all true for other RISC processors as well.

The 64bit API on UltraSparcs does bring along some extra FP registers IIRC.

        It's very different on x86.
64-bit x86 like the Opteron has more registers, which are very scarce on the base x86 (8 I think). This alone is very important. There are other factors as well.

Solaris, at least, provided support for far more than 4GB of physical memory on 32 bit kernels. A newer 64 bit kernel might be more efficient, but that's just because the time was taken to support large page sizes and more efficient data structures. It's nothing intrinsic to a 32 vs 64 bit kernel.

Well, on a large working set, a processor which can directly address more than 4GB of memory will be a lot faster than one which can't, and has to play with the MMU and paging units !

