>PS: I am really looking forward to ZN's post explaining unsuitability >issues with coldfire processors. (hah, on topic!)
It would be easy if I could post attachments - Micro APL provide a cross compiler (68k to ColdFire) and emulation pack for CF V3 and 4 free of charge, it's worth getting if nothing else only for the documentation which gives lots of insight into this problem. Since I don't have that option i am going to quote relevant parts of the documentation, so WARNING: THIS IS ONE LONG MAIL!!! I should also mention that it's not worth expending money om naking a QL compatible using anthing but the most capable V4e ColdFire as the chip prices are nearly the same and the performance vastly improved in the last version, not to mention all the extra stuff you get on the chip itself. Incidentally, this means that there is an emulation capability provided, that can be used to emulate most (but not all!) instructions that are implemented in the 68k and not in the CF series CPUs. At QL2004 I only briefly spoke with a few people about this, proposing that most of the instruction set add-ons introduced by 68020, 30, 40 and 60 CPUs not be used as it greatly complicates proper emulation. Fortunately, in the greatest proportion of all software, the CPU is treated as a very slighty expanded and fast bog standard 68k. One job that needs to be done is to carefully and pragmatically decide which if any extensions should be added. Good candidates would be 32-bit multiplication and division, and possibly floating point instructions (note: V4e ColdFires have a FPU, but it is simpler than the original full and extended IEEE implementation in the 68881 and 882 FPUs). Also, it should be decided which instructiuons are not to be used at all (good candidate would be MOVEP), and which should be deprecated and recomended for avoidance, for efficiency reasons. Sadly, this goes against some brilliant work done by other folks, most notably George Gwilt - but at this point, if there is a way forward for a hardware platform (*) it is doubtfull that there is any other choice. (*) I still strongly advocate the existance of a hardware platform. One could consider me biased, surely - but also consider this: SMSQ/E is a GREAT asset in a world of embedded programming, in which developement systems are notoriously composed of vapourware. Mostly the hardware is there, but the software mostly flat out doesn't work or is completely unhelpful - the developers are left to their own devices to make things work as intended. The QL community is dwindling, and with it another great asset: knowledge of efficient embedded programming. In a world where a control program for a LCD monitor uses up 50k of code, programmers that know you can fit entire OSs and more into the same space are VERY hard to find, and also very sought after - it has now come to a point where the existance of such programming is nearly considered a myth. Selling one embedded QL technology based product is likely to be equivalent to the total sales of a major product in the QL market - the frst, given proper attention, can occur several times every year, with gathering mnomentum, the second once every several years. Money earned is not by far the most important result of this: the addition of crytical mass of developers that have a clear way to benefit from their work IS - it all filters back into the QL community. IMHO, this is the way for the QL to survive, and even possibly, thrive in a quiet, but important sort of way, doing what it is best at: reliably solving unique problems. Anyway, back to the ColdFire dilemma: Here is an excerpt from MicroAPL's PortASM user's manual: :quote: Although the ColdFire architecture is closely related to the 680x0, there are many simplifications to the instruction set which mean that 680x0 assembler code may require substantial modifications... Nearly all of the differences are omissions from the 680x0 instruction set and addressing modes. This means that (with a few important exceptions detailed later), a 680x0 instruction which is implemented in ColdFire behaves in exactly the same way under the two architectures. In fact, almost all user-level (and much supervisor-level) ColdFire code can be run unchanged on a 68020 or later 680x0 processor. THE CONVERSE, HOWEVER, IS NOT THE CASE. In outline, the main omissions fall into five categories: � Missing addressing modes � Missing instructions � Non-availability of word- and byte-forms of nearly all arithmetic and logical instructions � Many instructions act only on registers, not on memory � Restrictions on available addressing modes for particular instructions � Simplification of the supervisor-level programming model In addition to these omissions, the ColdFire version 4 core includes some new instructions which PortAsm can optionally make use of - in particular MVS (movewith-sign-extend) and MVZ (move-with-zero-extend). ...Standard RISC processors such as the PowerPC achieve high performance at the expense of low code density... This all means that programs compiled for RISC processors tend to be substantially larger than those compiled for CISC architectures such as the 680x0. This penalty does not greatly matter for powerful servers or workstations with 64MB or more of RAM, but for some embedded applications it can be a significant disadvantage, both in terms of system cost and power consumption. The ColdFire architecture... is optimized for code written in C or C++, and instructions which are not frequently generated by compilers are amongst those removed from the instruction set... In order to regularize the instruction stream, ALL COLDFIRE INSTRUCTIONS ARE EITHER 2, 4 OR 6 BYTES WIDE, this is why certain combinations of source and destination operands are not available. Missing addressing modes: The ColdFire addressing modes are quite similar to those of the original 68000, i.e. without the extensions introduced in the 68020 and later processors, but with some differences in indexed addressing. Compared with a 68020 or later processor, the comparison is as follows: Fully supported: Data Register Direct D0 Address Register Direct A3 Address Register Indirect (A5) Post-increment (A1)+ Pre-decrement -(A7) Displacement (16-bit displacement) 100(A2) PC Displacement (16-bit displacement) 100(PC) Absolute Short ($100).W Absolute Long ($220E0).L Immediate #3 Partially supported: Indexed (10,A2,D3.L*4) PC Indexed (0,PC,D2.L*2) The restrictions on these two modes are: (a) The displacement constant is 8-bit only (b) 'Zero-suppressed' registers are not supported (c) The Index register can only be handled as a Long. Word-length index registers are not supported. (d) The scale factor must be 1, 2, or 4. Scale factors of 8 are not supported. Not implemented at all: Memory-indirect post-indexed ([12,A3],D2*W,1000) Memory-indirect pre-indexed ([12,A3,D2*W],1000) PC-indirect post-indexed ([12,PC],D2*W,1000) PC-indirect pre-indexed ([12,PC,D2*W],1000) NOTE THAT FURTHER RESTRICTIONS MAY BE IMPOSED ON THE ADDRESSING MODES SUPPORTED BY PARTICULAR INSTRUCTIONS, EVEN IF A PARTICULAR ADDRESSING MODE IS ITSELF AVAILABLE ON COLDFIRE. Missing instructions: A number of instructions are not implemented at all under ColdFire. These include: DBcc, EXG, RTR, RTD, CMPM, ROL, ROR, ROXL, ROXR, MOVE16, ABCD, SBCD, NBCD, BFCHG, BFCLR, BFEXTS, BFEXTU, BFFFO, BFINS, BFSET, BFTST, CALLM, RTM, PACK, UNPK CHK, CHK2, CMP2, CAS, CAS2, TAS (supported in V4 core), BKPT, BGND, LPSTOP, TBLU, TBLS, TBLUN, TBLSN, TRAPV, TRAPcc, MOVEP, MOVES, RESET, ORI to CCR, EORI to CCR, ANDI to CCR In addition, DIVS and DIVU (with some differences from the 680x0 equivalents) are available on some ColdFire processors but not others. MULU and MULS producing a 64-bit result are not implemented, but 16 x 16 producing 32-bit, and 32 x 32 producing (truncated) 32-bit, are available. [Comment by ZN: Out of the above, DBcc, ORI EORI ANDI to CCR are obvious problems, and ROL ROR ROXL ROXR although seldom used do appear in QL code. Most of the others are no great loss... although some bit field instructions would have been nice...] Long-word forms only: Most arithmetic and logical instructions can act on Long words only. This applies to: ADD, ADDA, ADDI, ADDQ, ADDX, AND, ANDI, ASL, ASR, CMP*, CMPA, CMPI*, EOR, EORI, LSL, LSR, NEG, NEGX, NOT, OR, ORI, SUB, SUBA, SUBI, SUBQ, SUBX *For the ColdFire Version 4 core the CMP and CMPI instructions are fully supported. [Comment by ZN: these are a serious problem as they are very often used. They can be enulated but emulation can be particulairly inefficient if one absoultely needs to calculate the V flag] MOVEM.W has also been removed from the instruction set in all versions. In fact, the only instructions which do act on the full set of byte, word and long operands are CLR, MOVE and TST, together with CMP and CMPI for the V4 core. EXT.W, EXTB.L and EXT.L survive in all versions, as do MULx.W and MULx.L Instructions which act only on registers, not on memory: Some arithmetic instructions cannot act directly on memory - the destination must be a register. This applies to: ADDI, ADDX, ANDI, CMPI, ASL, ASR, LSL, LSR, NEG, NEGX, NOT, EORI, ORI, SUBI, SUBX, Scc Note that ADDQ and SUBQ can act directly on memory. Restrictions on addressing modes for particular instructions: Even where a particular memory addressing mode does exist in ColdFire, some instructions are subject to further restrictions. Often, this is because of the limit of six bytes as the maximum length of a single instruction. Specific restrictions include: (a) Some combinations of addressing modes for MOVE are disallowed: If the source addressing mode is Displacement or PC Displacement, the destination addressing mode cannot be Indexed or Absolute. If the source addressing mode is Indexed, PC Indexed or Absolute, the destination addressing mode cannot be Displacement, Indexed or Absolute. For the Version 4 core, if the source addressing mode is Immediate and the operation is a 32-bit move, the destination addressing mode cannot be Displacement. (b) The addressing modes for MOVEM are restricted to only Displacement and Indexed - no Pre-decrement or Post-increment! (c) For BTST, BSET, BCLR and BCHG, if the source operand is a static bit number, the destination cannot be Indexed or Absolute memory. Miscellaneous Omissions: There are a few miscellaneous omissions for specific instructions: LINK.L is not supported MOVE to CCR/SR: Source must be Immediate or Data Register MOVE from CCR/SR: Destination must be data register In the Version 4 core, BSR and Bcc accept 8-, 16- or 32-bit displacement as in most 680x0 processors, Version 2 and 3 only accept 8- and 16-bit. Instructions which behave differently from the 680x0 equivalent: There are a few subtle cases where the ColdFire instruction is not exactly the same as its 680x0 counterpart. The most important of these is that multiply instructions (MULU and MULS) do not set the overflow bit. This means that a 680x0 code sequence which checks for overflow on multiply may assemble and apparently run under ColdFire, but give incorrect results. ASL and ASR also differ in that they do not set the overflow bit - but this is less likely to cause problems for real programs! New Instructions in ColdFire Version 4 Architecture: In Version 4 of the ColdFire architecture as well as re-introducing some instructions present in 680x0 but missing from earlier ColdFire cores (e.g. CMP.B/W, BRA.L, BSR.L, Bcc.L), there are two new instructions which PortAsm can make use of: MVS <ea>,Dn Move byte or word and sign-extend to 32-bits in Dn MVZ <ea>,Dn Move byte or word and zero-extend to 32-bits in Dn [Comment by ZN: these are used for much more efficient emulation of unimplemented .B and .W logic and arithmetic instructions] Simplification of the supervisor programming model: The most important simplification in ColdFire�s supervisor-level model is that there is only one stack pointer, shared for all code including interrupts, supervisor-level services, and user code. It follows from this that, on ColdFire, it is never safe to write below the stack, since any interrupt which occurs would overwrite the stored data. (Writing below the stack, though not recommended, is possible in some 680x0 systems in user mode, because interrupts cause a switch to the Interrupt or Supervisor Stack Pointer). A further issue is that ColdFire processors automatically align the stack to a four-byte boundary when an exception occurs, which can cause problems if code is reading or writing at a fixed offset from the stack pointer. In fact, it is strongly recommended (for performance reasons) that the ColdFire stack should be kept long-word aligned at all times. :endquote: Now, there is no 'right way' to make 68k programs CF compatible. Obviously, for the ones where the assembly source is available, portASM seems the most logical way - however, it is worth looking in PortASMs documentation to see what, how and why it does what it does. The program is actually quite clever but even so, knowing what the programmer opriginally intended can save a LOT of code with very little modification. The most logical choice for an 'interactive rewrite' approach, where PortASM can be used but the end result carefully examined and simplified or ammended by hand, would of course be SMSQ/E itself, as well as all low level code such as drivers. For the rest of the programs, it is possiblke to recourse to emulation. That being said, posing ensible restrictions (most of which are in place anyway) as to what instructions can or cannot be used, can simplify this task quite considerably. This is all the more important knowing that the V4e ColdFire implements 32k of very fast RAM on the chip (in addition to 2x 32k cache) which is ideally suited to hold emulation code. This means that all of such efficiency crytical code needs to fit into 32k. Given that the basic OS fits into less, in the QL world this should not be a problem. In particular, most of the attention with hand-coding of emulations should be concentrated withing the OS, handling differences caused by User and Supervisor stacks now being the same physical register, most of which will concern handling of TRAPs. however, emulation is not perfect. Some code really only can work if changed or patched. Here is an excerpt from the CF emulation library documentation, also by MicroAPL: :quote: CF68KLib will install its own handlers for the 'Illegal Instruction' and 'Address Error' exceptions which can occur when unimplemented 680x0 instructions are executed on ColdFire. These handlers will perform the same function as the missing instruction and then return execution to the instruction which follows. The existence of CF68KLib is thus transparent to your application code except that it can incur a substantial performance penalty. It is occasionally necessary to make minor modifications to your 680x0 program before it can be run under CF68KLib. This is because there are a very small number of 680x0 instructions which are also legal in ColdFire (and hence do not cause an exception) but which do not behave identically. As an example, the MULS instruction executes identically under ColdFire except that it does not set the Overflow flag in the condition codes register. The 68000 and 68010 versions of CF68KLib depart slightly from the behaviour of the original processor when accessing memory. When the 68000-processor version of CF68KLib accesses memory it uses the full 32-bit 68K address (adding the 32-bit base address offset), rather than the 24-bit address which a real 68000 would use. In addition, it will not signal an exception on unaligned memory accesses (e.g. reading a 16- or 32-bit value at an odd address). In this respect they behave like a 68020 or higher processor. [Comment by ZN: most programs on the QL are written for the above sort of environment anyway!] Certain 680x0 instructions which have no equivalent in ColdFire cannot sensibly be handled by CF68KLib. These instructions are: CAS: Compare-and-Swap instruction used in to implement semaphores in a multi-processor environment. CAS2 (68020 and higher): Similar to CAS MOVES (68010 and higher): Move Address Space instruction BKPT (68020 and higher): Hardware breakpoint instruction CALLM (68020 only): Call Module instruction uses external hardware for access control. RTM (68020 only): Similar to CALLM [Comment by ZN: most (all?) of these are never used in user code, and only CAS and MOVES could possibly be used in OS code which would normally initialize a 68020+ CPu and which would have to be rewritten anyway for ColdFire. The same is true for some special registers, also discussed in the CFlib documentation] The possible conditions under which CF68KLib detects an emulation error are as follows: 1) occurs if the 680x0 instruction was something like: move.l -4(a7),(a0,d0.w) The problem with this instruction is that it is not legal in ColdFire and hence causes an exception, but by the time CF68KLib is called to begin emulating the instruction the exception stack frame has over-written the data at -4(a7) 2) occurs if the 680x0 instruction was something like: move.l (a7)+,(a0,d0.w) The problem with this instruction is that the exception does not occur until the ColdFire processor is half-way through processing it - it succesfully fetches the source operand from (a7)+ but then discovers that the destination operand is not legal for ColdFire. It then proceeds to take an exception, but the exception stack frame over-writes the source operand. [Comment by ZN: most if not all of the above is not part of standard QL code writing practise, but may appear in C compiled programs - research is needed to check if C68 generates such sequences.] The principle behind the successful operation of CF68KLib is that all 680x0 instructions are either legal in ColdFire - and behave identically - or cause an exception which CF68KLib can catch to handle the differences. Unfortunately, ...there are a very few 680x0 instructions for which this is not the case. To recap: 1. Certain 68020 multiply/divide instructions don't trap out and don't give the same result: MULS.L <ea>,Dh:Dl (Signed multiply: 32x32 -> 64) MULU.L <ea>,Dh:Dl (Unsigned multiply: 32x32 -> 64) DIVS.L <ea>,Dr:Dq (Signed divide: 64/32 -> 32r:32q) DIVSL.L <ea>,Dr:Dq (Signed divide: 32/32 -> 32r:32q) DIVU.L <ea>,Dr:Dq (Unsigned divide: 64/32 -> 32r:32q) DIVUL.L <ea>,Dr:Dq (Unsigned divide: 32/32 -> 32r:32q) [Comment by ZN: 68000 does not implement most of these anyway and 99.99% of all Ql code is 68000] 2. The multiply instructions (MULU and MULS) do not set the overflow bit. This means that a 680x0 code sequence which checks for overflow on multiply may run under ColdFire, but give incorrect results. 3. The arithmetic shift instructions (ASL and ASR) also differ in that they do not set the overflow bit 4. The instructions "MOVE.B <ea>,-(A7)" and "MOVE.B (A7)+,<ea>" only change the stack pointer by one - on 680x0 the stack pointer would change by two. IF ANY OF THESE DIFFERENCES AFFECT THE CORRECT OPERATION OF YOUR 680X0 PROGRAM YOU WILL NEED TO MAKE CHANGES TO THE SOURCE CODE! To handle 2. or 3. or 4. you need to recode the source to avoid using the problem instruction... What is needed is a way of forcing the ColdFire processor to trap so that CF68KLib can handle the instruction. CF68KLib reassigns one of the 16-bit opcodes, 0x4E00, which is not used in 680x0 or ColdFire and which causes an exception. This is used as an 'escape' telling CF68KLib to emulate the next instruction. For example if your source code contains: .short 0x4E00 divsl.l d0,d1:d2 �then CF68KLib will catch the exception caused by the 0x4E00 opcode and emulate the DIVSL instruction which follows as though it had itself caused the exception. :endquote: All of this sems a like a whole lot of bother - but the prospect of running SMSQ/E on a V4e ColdFire is quite appealing. This CPU has substantial computing power (out of reach of forseeable emulated platforms for a while yet), and includes on chip nearly all one needs to build a very full featured computer. The particular chip of choice would be te 5474 or 75. It interfaces to 266MHz DDR RAM, performs over 400MIPS at 266MHz and includes a PCI2.2 bus, USB2.0 device controller, 10/100 Ethernet, MMU, FPU, EMAC DSP extension, serial ports etc etc... using less than 1.5W of power, and costing about $25 in large quantity. With V5 cores just being announced by Motorola/Freescale, running up to 1.6x faster, it seems the logical choice... Using ARM would require a LOT more time invested in writing an emulator, plus getting familiar (if not getting them work) with new developement tools - developement which can in fact be done on 68k for CF processors. Using the same approach with PPC also adds the 'reinventing the wheel' issue as Apple has done this already with PPC based MAC. Both would either not be able to provide comparable performance to CF at given price, or a lot more money would have to be put into the hardware to provide acceptable performance, unless thousands of man-hours of work are invested in emulator technology - if PPC was used, some attorney from Apple would no doubt come a-knockin'. Using a Crusoe for example, could ultimately yield the most efficient emulation (out of reach of PC based emulators for the forseeable future) at reasonable hardware price, but again even more thousands of man-hours of work would have to be invested in the emulator - though one could sell this emulator elsewhere for very decent money, I am sure. Other exotic approaches include using DSPs but these continually suffer from roadmap problems - they are performance driven, not necessairly retaining compatibility. N. _______________________________________________ QL-Users Mailing List http://www.q-v-d.demon.co.uk/smsqe.htm
