Re: [ql-users] The hardware conflict...

ZN Tue, 30 Nov 2004 07:39:27 -0800

>PS: I am really looking forward to ZN's post explaining unsuitability
>issues with coldfire processors. (hah, on topic!)


It would be easy if I could post attachments - Micro APL provide a cross
compiler (68k to ColdFire) and emulation pack for CF V3 and 4 free of
charge, it's worth getting if nothing else only for the documentation which
gives lots of insight into this problem. Since I don't have that option i
am going to quote relevant parts of the documentation, so WARNING: THIS IS
ONE LONG MAIL!!!

I should also mention that it's not worth expending money om naking a QL
compatible using anthing but the most capable V4e ColdFire as the chip
prices are nearly the same and the performance vastly improved in the last
version, not to mention all the extra stuff you get on the chip itself.
Incidentally, this means that there is an emulation capability provided,
that can be used to emulate most (but not all!) instructions that are
implemented in the 68k and not in the CF series CPUs.

At QL2004 I only briefly spoke with a few people about  this, proposing
that most of the instruction set add-ons introduced by 68020, 30, 40 and 60
CPUs not be used as it greatly complicates proper emulation. Fortunately,
in the greatest proportion of all software, the CPU is treated as a very
slighty expanded and fast bog standard 68k. One job that needs to be done
is to carefully and pragmatically decide which if any extensions should be
added. Good candidates would be 32-bit multiplication and division, and
possibly floating point instructions (note: V4e ColdFires have a FPU, but
it is simpler than the original full and extended IEEE implementation in
the 68881 and 882 FPUs). Also, it should be decided which instructiuons are
not to be used at all (good candidate would be MOVEP), and which should be
deprecated and recomended for avoidance, for efficiency reasons. Sadly,
this goes against some brilliant work done by other folks, most notably
George Gwilt - but at this point, if there is a way forward for a hardware
platform (*) it is doubtfull that there is any other choice.

(*) I still strongly advocate the existance of a hardware platform. One
could consider me biased, surely - but also consider this:
SMSQ/E is a GREAT asset in a world of embedded programming, in which
developement systems are notoriously composed of vapourware. Mostly the
hardware is there, but the software mostly flat out doesn't work or is
completely unhelpful - the developers are left to their own devices to make
things work as intended. The QL community is dwindling, and with it another
great asset: knowledge of efficient embedded programming. In a world where
a control program for a LCD monitor uses up 50k of code, programmers that
know you can fit entire OSs and more into the same space are VERY hard to
find, and also very sought after - it has now come to a point where the
existance of such programming is nearly considered a myth. Selling one
embedded QL technology based product is likely to be equivalent to the
total sales of a major product in the QL market - the frst, given proper
attention, can occur several times every year, with gathering mnomentum,
the second once every several years. Money earned is not by far the most
important result of this: the addition of crytical mass of developers that
have a clear way to benefit from their work IS - it all filters back into
the QL community. IMHO, this is the way for the QL to survive, and even
possibly, thrive in a quiet, but important sort of way, doing what it is
best at: reliably solving unique problems.

Anyway, back to the ColdFire dilemma:

Here is an excerpt from MicroAPL's PortASM user's manual:

:quote:

Although the ColdFire architecture is closely related to the 680x0, there
are many simplifications to the instruction set which mean that 680x0
assembler code may require substantial modifications...
Nearly all of the differences are omissions from the 680x0 instruction set
and addressing modes. This means that (with a few important exceptions
detailed
later), a 680x0 instruction which is implemented in ColdFire behaves in
exactly the same way under the two architectures. In fact, almost all
user-level (and much supervisor-level) ColdFire code can be run unchanged
on a 68020 or later 680x0 processor. THE CONVERSE, HOWEVER, IS NOT THE
CASE.
In outline, the main omissions fall into five categories:
� Missing addressing modes
� Missing instructions
� Non-availability of word- and byte-forms of nearly all arithmetic and
logical instructions
� Many instructions act only on registers, not on memory
� Restrictions on available addressing modes for particular instructions
� Simplification of the supervisor-level programming model
In addition to these omissions, the ColdFire version 4 core includes some
new instructions which PortAsm can optionally make use of - in particular
MVS (movewith-sign-extend) and MVZ (move-with-zero-extend).

...Standard RISC processors such as the PowerPC achieve high performance
at the expense of low code density... This all means that programs compiled
for RISC processors tend to be substantially larger than those compiled for
CISC architectures such as the 680x0. This penalty does not greatly matter
for powerful servers or workstations with 64MB or more of RAM, but for some
embedded applications it can be a significant disadvantage, both in terms
of system cost and power consumption. The ColdFire architecture... is
optimized for code written in C or C++, and instructions which are not
frequently generated by compilers are amongst those removed from the
instruction set... In order to regularize the instruction stream, ALL
COLDFIRE INSTRUCTIONS ARE EITHER 2, 4
OR 6 BYTES WIDE, this is why certain combinations of source and destination
operands are not available.

Missing addressing modes:
The ColdFire addressing modes are quite similar to those of the original
68000, i.e. without the extensions introduced in the 68020 and later
processors, but with some differences in indexed addressing. Compared with
a 68020 or later processor, the comparison is as follows:

Fully supported:
Data Register Direct                    D0
Address Register Direct                 A3
Address Register Indirect               (A5)
Post-increment                          (A1)+
Pre-decrement                           -(A7)
Displacement (16-bit displacement)      100(A2)
PC Displacement (16-bit displacement)   100(PC)
Absolute Short                          ($100).W
Absolute Long                           ($220E0).L
Immediate                               #3

Partially supported:
Indexed                                 (10,A2,D3.L*4)
PC Indexed                              (0,PC,D2.L*2)
The restrictions on these two modes are:
(a) The displacement constant is 8-bit only
(b) 'Zero-suppressed' registers are not supported
(c) The Index register can only be handled as a Long. Word-length index
registers are not supported.
(d) The scale factor must be 1, 2, or 4. Scale factors of 8 are not
supported.

Not implemented at all:
Memory-indirect post-indexed            ([12,A3],D2*W,1000)
Memory-indirect pre-indexed             ([12,A3,D2*W],1000)
PC-indirect post-indexed                ([12,PC],D2*W,1000)
PC-indirect pre-indexed                 ([12,PC,D2*W],1000)

NOTE THAT FURTHER RESTRICTIONS MAY BE IMPOSED ON THE ADDRESSING MODES
SUPPORTED BY PARTICULAR INSTRUCTIONS, EVEN IF A PARTICULAR ADDRESSING MODE
IS ITSELF AVAILABLE ON COLDFIRE.

Missing instructions:
A number of instructions are not implemented at all under ColdFire. These
include:

DBcc, EXG, RTR, RTD, CMPM, ROL, ROR, ROXL, ROXR, MOVE16, ABCD, SBCD, NBCD,
BFCHG, BFCLR, BFEXTS, BFEXTU, BFFFO, BFINS, BFSET, BFTST, CALLM, RTM, PACK,
UNPK
CHK, CHK2, CMP2, CAS, CAS2, TAS (supported in V4 core), BKPT, BGND, LPSTOP,
TBLU, TBLS, TBLUN, TBLSN, TRAPV, TRAPcc, MOVEP, MOVES, RESET, ORI to CCR,
EORI to CCR, ANDI to CCR

In addition, DIVS and DIVU (with some differences from the 680x0
equivalents) are available on some ColdFire processors but not others. MULU
and MULS producing a 64-bit result are not implemented, but 16 x 16
producing 32-bit, and 32 x 32 producing (truncated) 32-bit, are available.

[Comment by ZN: Out of the above, DBcc, ORI EORI ANDI to CCR are obvious
problems, and ROL ROR ROXL ROXR although seldom used do appear in QL code.
Most of the others are no great loss... although some bit field
instructions would have been nice...]

Long-word forms only:
Most arithmetic and logical instructions can act on Long words only. This
applies to:

ADD, ADDA, ADDI, ADDQ, ADDX, AND, ANDI, ASL, ASR, CMP*, CMPA, CMPI*, EOR,
EORI, LSL, LSR, NEG, NEGX, NOT, OR, ORI, SUB, SUBA, SUBI, SUBQ, SUBX
*For the ColdFire Version 4 core the CMP and CMPI instructions are fully
supported.

[Comment by ZN: these are a serious problem as they are very often used.
They can be enulated but emulation can be particulairly inefficient if one
absoultely needs to calculate the V flag]

MOVEM.W has also been removed from the instruction set in all versions.
In fact, the only instructions which do act on the full set of byte, word
and long operands are CLR, MOVE and TST, together with CMP and CMPI for the
V4 core. EXT.W, EXTB.L and EXT.L survive in all versions, as do MULx.W and
MULx.L

Instructions which act only on registers, not on memory:
Some arithmetic instructions cannot act directly on memory - the
destination must be a register. This applies to:

ADDI, ADDX, ANDI, CMPI, ASL, ASR, LSL, LSR, NEG, NEGX, NOT, EORI, ORI,
SUBI, SUBX, Scc

Note that ADDQ and SUBQ can act directly on memory.

Restrictions on addressing modes for particular instructions:
Even where a particular memory addressing mode does exist in ColdFire, some
instructions are subject to further restrictions. Often, this is because of
the limit of six bytes as the maximum length of a single instruction.
Specific restrictions include:

(a) Some combinations of addressing modes for MOVE are disallowed:
If the source addressing mode is Displacement or PC Displacement, the
destination addressing mode cannot be Indexed or Absolute.
If the source addressing mode is Indexed, PC Indexed or Absolute, the
destination addressing mode cannot be Displacement, Indexed or Absolute.
For the Version 4 core, if the source addressing mode is Immediate and the
operation is a 32-bit move, the destination addressing mode cannot be
Displacement.
(b) The addressing modes for MOVEM are restricted to only Displacement and
Indexed - no Pre-decrement or Post-increment!
(c) For BTST, BSET, BCLR and BCHG, if the source operand is a static bit
number, the destination cannot be Indexed or Absolute memory.

Miscellaneous Omissions:
There are a few miscellaneous omissions for specific instructions:
LINK.L is not supported
MOVE to CCR/SR: Source must be Immediate or Data Register
MOVE from CCR/SR: Destination must be data register
In the Version 4 core, BSR and Bcc accept 8-, 16- or 32-bit displacement as
in most 680x0 processors, Version 2 and 3 only accept 8- and 16-bit.

Instructions which behave differently from the 680x0 equivalent:
There are a few subtle cases where the ColdFire instruction is not exactly
the same as its 680x0 counterpart. The most important of these is that
multiply
instructions (MULU and MULS) do not set the overflow bit. This means that a
680x0 code sequence which checks for overflow on multiply may assemble
and apparently run under ColdFire, but give incorrect results. ASL and ASR
also differ in that they do not set the overflow bit - but this is less
likely
to cause problems for real programs!

New Instructions in ColdFire Version 4 Architecture:
In Version 4 of the ColdFire architecture as well as re-introducing some
instructions present in 680x0 but missing from earlier ColdFire cores (e.g.
CMP.B/W, BRA.L, BSR.L, Bcc.L), there are two new instructions which PortAsm
can make use of:
MVS <ea>,Dn Move byte or word and sign-extend to 32-bits in Dn
MVZ <ea>,Dn Move byte or word and zero-extend to 32-bits in Dn
[Comment by ZN: these are used for much more efficient emulation of
unimplemented .B and .W logic and arithmetic instructions]

Simplification of the supervisor programming model:
The most important simplification in ColdFire�s supervisor-level model is
that there is only one stack pointer, shared for all code including
interrupts, supervisor-level services, and user code. It follows from this
that, on ColdFire, it is never safe to write below the stack, since any
interrupt which occurs would overwrite the stored data. (Writing below the
stack, though not recommended, is possible in some 680x0 systems in user
mode, because interrupts cause a switch to the Interrupt or Supervisor
Stack Pointer).
A further issue is that ColdFire processors automatically align the stack
to a four-byte boundary when an exception occurs, which can cause problems
if code is reading or writing at a fixed offset from the stack pointer. In
fact, it is strongly recommended (for performance reasons) that the
ColdFire stack should be kept long-word aligned at all times.

:endquote:

Now, there is no 'right way' to make 68k programs CF compatible. Obviously,
for the ones where the assembly source is available, portASM seems the most
logical way - however, it is worth looking in PortASMs documentation to see
what, how and why it does what it does. The program is actually quite
clever but even so, knowing what  the programmer opriginally intended can
save a LOT of code with very little modification. The most logical choice
for an 'interactive rewrite' approach, where PortASM can be used but the
end result carefully examined and simplified or ammended by hand, would of
course be SMSQ/E itself, as well as all low level code such as drivers.

For the rest of the programs, it is possiblke to recourse to emulation.
That being said, posing ensible restrictions (most of which are in place
anyway) as to what instructions can or cannot be used, can simplify this
task quite considerably. This is all the more important knowing that the
V4e ColdFire implements 32k of very fast RAM on the chip (in addition to 2x
32k cache) which is ideally suited to hold emulation code. This means that
all of such efficiency crytical code needs to fit into 32k. Given that the
basic OS fits into less, in the QL world this should not be a problem. In
particular, most of the attention with hand-coding of emulations should be
concentrated withing the OS, handling differences caused by User and
Supervisor stacks now being the same physical register, most of which will
concern handling of TRAPs.

however, emulation is not perfect. Some code really only can work if
changed or patched. Here is an excerpt from the CF emulation library
documentation, also by MicroAPL:

:quote:

CF68KLib will install its own handlers for the 'Illegal Instruction' and
'Address Error' exceptions which can occur when unimplemented 680x0
instructions are executed on ColdFire. These handlers will perform the same
function as the missing instruction and then return execution to the
instruction which follows. The existence of CF68KLib is thus transparent to
your application code except that it can incur a substantial performance
penalty.
It is occasionally necessary to make minor modifications to your 680x0
program before it can be run under CF68KLib. This is because there are a
very small
number of 680x0 instructions which are also legal in ColdFire (and hence do
not cause an exception) but which do not behave identically. As an example,
the
MULS instruction executes identically under ColdFire except that it does
not set the Overflow flag in the condition codes register.

The 68000 and 68010 versions of CF68KLib depart slightly from the behaviour
of the original processor when accessing memory. When the 68000-processor
version of CF68KLib accesses memory it uses the full 32-bit 68K address
(adding the 32-bit base address offset), rather than the 24-bit address
which a
real 68000 would use. In addition, it will not signal an exception on
unaligned memory accesses (e.g. reading a 16- or 32-bit value at an odd
address). In this respect they behave like a 68020 or higher processor.
[Comment by ZN: most programs on the QL are written for the above sort of
environment anyway!]

Certain 680x0 instructions which have no equivalent in ColdFire cannot
sensibly be handled by CF68KLib. These instructions are:
CAS: Compare-and-Swap instruction used in to implement semaphores in a
multi-processor environment.
CAS2 (68020 and higher): Similar to CAS
MOVES (68010 and higher): Move Address Space instruction
BKPT (68020 and higher): Hardware breakpoint instruction
CALLM (68020 only): Call Module instruction uses external hardware for
access control.
RTM (68020 only): Similar to CALLM
[Comment by ZN: most (all?) of these are never used in user code, and only
CAS and MOVES could possibly be used in OS code which would normally
initialize a 68020+ CPu and which would have to be rewritten anyway for
ColdFire. The same is true for some special registers, also discussed in
the CFlib documentation]

The possible conditions under which CF68KLib detects an emulation error are
as follows:
1) occurs if the 680x0 instruction was something like: move.l
-4(a7),(a0,d0.w)
The problem with this instruction is that it is not legal in ColdFire and
hence causes an exception, but by the time CF68KLib is called to begin
emulating the instruction the exception stack frame has over-written the
data at -4(a7)
2) occurs if the 680x0 instruction was something like: move.l
(a7)+,(a0,d0.w)
The problem with this instruction is that the exception does not occur
until the ColdFire processor is half-way through processing it - it
succesfully fetches the source operand from (a7)+ but then discovers that
the destination operand is not legal for ColdFire. It then proceeds to take
an exception, but the exception stack frame over-writes the source operand.
[Comment by ZN: most if not all of the above is not part of standard QL
code writing practise, but may appear in C compiled programs - research is
needed to check if C68 generates such sequences.]

The principle behind the successful operation of CF68KLib is that all 680x0
instructions are either legal in ColdFire - and behave identically - or
cause an exception which CF68KLib can catch to handle the differences.
Unfortunately, ...there are a very few 680x0 instructions for which this is
not the case. To recap:
1. Certain 68020 multiply/divide instructions don't trap out and don't give
the same result:
MULS.L <ea>,Dh:Dl       (Signed multiply: 32x32 -> 64)
MULU.L <ea>,Dh:Dl       (Unsigned multiply: 32x32 -> 64)
DIVS.L <ea>,Dr:Dq       (Signed divide: 64/32 -> 32r:32q)
DIVSL.L <ea>,Dr:Dq      (Signed divide: 32/32 -> 32r:32q)
DIVU.L <ea>,Dr:Dq       (Unsigned divide: 64/32 -> 32r:32q)
DIVUL.L <ea>,Dr:Dq      (Unsigned divide: 32/32 -> 32r:32q)
[Comment by ZN: 68000 does not implement most of these anyway and 99.99% of
all Ql code is 68000]

2. The multiply instructions (MULU and MULS) do not set the overflow bit.
This means that a 680x0 code sequence which checks for overflow on multiply
may
run under ColdFire, but give incorrect results.

3. The arithmetic shift instructions (ASL and ASR) also differ in that they
do not set the overflow bit

4. The instructions "MOVE.B <ea>,-(A7)" and "MOVE.B (A7)+,<ea>" only change
the stack pointer by one - on 680x0 the stack pointer would change by
two.

IF ANY OF THESE DIFFERENCES AFFECT THE CORRECT OPERATION OF YOUR 680X0
PROGRAM YOU WILL NEED TO MAKE CHANGES TO THE SOURCE CODE!

To handle 2. or 3. or 4. you need to recode the source to avoid using the
problem instruction... What is needed is a way of forcing the ColdFire
processor to trap so that CF68KLib can handle the instruction. CF68KLib
reassigns one of the 16-bit opcodes, 0x4E00, which is not used in 680x0 or
ColdFire and which causes an exception. This is used as an 'escape' telling
CF68KLib to emulate the next instruction. For example if your source code
contains:
.short 0x4E00
divsl.l d0,d1:d2
�then CF68KLib will catch the exception caused by the 0x4E00 opcode and
emulate the DIVSL instruction which follows as though it had itself caused
the
exception.

:endquote:

All of this sems a like a whole lot of bother - but the prospect of running
SMSQ/E on a V4e ColdFire is quite appealing. This CPU has substantial
computing power (out of reach of forseeable emulated platforms for a while
yet), and includes on chip nearly all one needs to build a very full
featured computer. The particular chip of choice would be te 5474 or 75. It
interfaces to 266MHz DDR RAM, performs over 400MIPS at 266MHz and includes
a PCI2.2 bus, USB2.0 device controller, 10/100 Ethernet, MMU, FPU, EMAC DSP
extension, serial ports etc etc... using less than 1.5W of power, and
costing about $25 in large quantity. With V5 cores just being announced by
Motorola/Freescale, running up to 1.6x faster, it seems the logical
choice...

Using ARM would require a LOT more time invested in writing an emulator,
plus getting familiar (if not getting them work) with new developement
tools - developement which can in fact be done on 68k for CF processors.
Using the same approach with PPC also adds the 'reinventing the wheel'
issue as Apple has done this already with PPC based MAC. Both would either
not be able to provide comparable performance to CF at given price, or a
lot more money would have to be put into the hardware to provide acceptable
performance, unless thousands of man-hours of work are invested in emulator
technology - if PPC was used, some attorney from Apple would no doubt come
a-knockin'. Using a Crusoe for example, could ultimately yield the most
efficient emulation (out of reach of PC based emulators for the forseeable
future) at reasonable hardware price, but again even more thousands of
man-hours of work would have to be invested in the emulator - though one
could sell this emulator elsewhere for very decent money, I am sure. Other
exotic approaches include using DSPs but these continually suffer from
roadmap problems - they are performance driven, not necessairly retaining
compatibility.

N.

_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm

Re: [ql-users] The hardware conflict...

Reply via email to