At least on Sparc processors, v8 and newer, any double precision math
(including longs) is performed with a single instruction, just like for
a 32 bit datum. Loads and stores of 8 byte datums are also handled via
a single instruction. The urban myth that 64bit math is
different/better on a 64 bit processor is just that; yes, some lower
end processors would emulate/trap those instructions but that an
implementation detail, not architecture. I believe that this is all
true for other RISC processors as well.
The 64bit API on UltraSparcs does bring along some extra FP registers
IIRC.
It's very different on x86.
64-bit x86 like the Opteron has more registers, which are very scarce on
the base x86 (8 I think). This alone is very important. There are other
factors as well.
Solaris, at least, provided support for far more than 4GB of physical
memory on 32 bit kernels. A newer 64 bit kernel might be more
efficient, but that's just because the time was taken to support large
page sizes and more efficient data structures. It's nothing intrinsic
to a 32 vs 64 bit kernel.
Well, on a large working set, a processor which can directly address more
than 4GB of memory will be a lot faster than one which can't, and has to
play with the MMU and paging units !
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster