Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-03-03 Thread Bernd Mueller
Bernd Mueller wrote: (very unexpected) result of this benchmark is, that a version with leaving the TStroke-Record packed, is about 13 % faster than the original patch. I am going to send a new patch soon. unfortunately this one is about 10 % slower on X86. So, I am going to leave this to

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-03-02 Thread Florian Klaempfl
Daniël Mantione schrieb: Op Fri, 29 Feb 2008, schreef Christian Iversen: Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead unaligned will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Marco van de Voort
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it looks quite different (and I'm not sure about all the types used here, some seem to be sets, some enumerations). Can be configured: http://lists.freepascal.org/docs-html/prog/progsu50.html Delphi has the minenumsize

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Michael Schnell
The VirtualTreeView tries to make the fields of the (packed) record aligned at dword boundary by grouping together smaller (one or two byte fields) or adding dummy fields. Does this trick overrides the unaligned memory access? Of course it is always a good idea to sort the members of a record

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Jonas Maebe
On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote: One more question: The VirtualTreeView tries to make the fields of the (packed) record aligned at dword boundary by grouping together smaller (one or two byte fields) or adding dummy fields. Does this trick overrides the

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen
Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen
Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Memory access. What happens is that the non-packed version causes more cache misses. A cache miss costs many cycles on a modern cpu, a misaligned read just costs an extra memory access (which is fast if cached) on x86,

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Daniël Mantione
Op Fri, 29 Feb 2008, schreef Christian Iversen: Memory access. What happens is that the non-packed version causes more cache misses. A cache miss costs many cycles on a modern cpu, a misaligned read just costs an extra memory access (which is fast if cached) on x86, and extra load

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Daniël Mantione
Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead unaligned will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of a penalty. Therefore, I wouldn't be surprised that even on ARM, arrays

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Daniël Mantione
Op Fri, 29 Feb 2008, schreef Christian Iversen: Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead unaligned will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen
Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead unaligned will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of a penalty. Therefore, I wouldn't be surprised that

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Yury Sidorov
From: Daniël Mantione [EMAIL PROTECTED] Instead unaligned will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of a penalty. Therefore, I wouldn't be surprised that even on ARM, arrays with packed

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen
Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead unaligned will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands,

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Luiz Americo Pereira Camara
Vinzent Hoefler wrote: Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it looks quite different (and I'm not sure about all the types used here, some seem to be sets, some enumerations). But at the first glance it seems, they used both packed records to either ensure

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Luiz Americo Pereira Camara
Jonas Maebe wrote: On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote: One more question: The VirtualTreeView tries to make the fields of the (packed) record aligned at dword boundary by grouping together smaller (one or two byte fields) or adding dummy fields. Does this trick

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Jonas Maebe
On 01 Mar 2008, at 02:00, Luiz Americo Pereira Camara wrote: The question is: using the layout below with packed (i can force the set size to be equal to Delphi) i still have unaligned memory access? As long as you record is declared as packed, all memory accesses are handled as if they

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records faster? Cache

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records faster?

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Micha Nelissen
Luiz Americo Pereira Camara wrote: Why are non packed records faster? The difference occurs at memory allocation or at memory access? In addition to what the others said, think of it like your 32 bit processor suddenly being a 8 bit processor: it has to manually load 4 times 8 bit, arrange

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell
On x86 processors it's usually only a speed penalty (or has anyone ever seen the AC flag turned on?), on other processors you may even have to workaround exceptions (i.e. bus errors), because the processor simply refuses to read or write unaligned data. It even is not guaranteed (or even

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Micha Nelissen
Michael Schnell wrote: If it accesses a misaligned 32 bit value it does two accesses (not 4): e.g. once 8 bit and once 24 bit (when reading each of the accesses is the same 32 bit, anyway). Logically you should think about it how I explained. That Intel did an optimization to make the speed

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell
internally the processor still has to have separate 8 bit data paths and do shifting to reorder the bytes. This is a barrel shifter in the data path that is integrated in the queue and does not take an additional execution cycle. -Michael ___

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione
Op Thu, 28 Feb 2008, schreef Vinzent Hoefler: On Thursday 28 February 2008 09:16, Daniël Mantione wrote: Memory access. What happens is that the non-packed version causes more cache misses. Please elaborate. If the (unaligned) data is crossing a cache-line, thus causing two full

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 11:25, Daniël Mantione wrote: Op Thu, 28 Feb 2008, schreef Vinzent Hoefler: On Thursday 28 February 2008 09:16, Daniël Mantione wrote: Memory access. What happens is that the non-packed version causes more cache misses. OMG. I'm s confused. ;) I read that

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Yury Sidorov
From: Daniël Mantione [EMAIL PROTECTED] On Thursday 28 February 2008 09:16, Daniël Mantione wrote: Memory access. What happens is that the non-packed version causes more cache misses. Please elaborate. If the (unaligned) data is crossing a cache-line, thus causing two full cache-line

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione
Op Thu, 28 Feb 2008, schreef Yury Sidorov: Yes, but if you have an array of them (as we have in this case), considerably more of these records will fit in the cache. Therefore you will have considerably less cache misses. This becomes even more serious when the processor in question does not

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione
Op Thu, 28 Feb 2008, schreef Michael Schnell: An ARM does not have such logic and will suffer cache miss after cache miss. Nonetheless the count of word transfers form memory to/from the cache would be smaller with packed records which might result in a lot faster execution (of course

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara
Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara
Luiz Americo Pereira Camara wrote: TVirtualNodePacked = packed record Index,//Offset 0 ChildCount: Cardinal; //Offset 4 NodeHeight: Word; //Offset 8 States: TVirtualNodeStates; //Offset 10 * Align: Byte; //Offset 14 ** CheckState: TCheckState;

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it looks quite different (and I'm not sure about all the types used here, some seem to be sets, some enumerations). But at the first glance it seems, they used both packed records to either ensure minimum size or known record

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Bernd Mueller
Vincent Snijders wrote: Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too. So it is fixed for sparc as well. yes, the changed patch is attached. Regards, Bernd. Index: packages/graph/src/inc/gtext.inc

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Vincent Snijders
Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too. So it is fixed for sparc as well. Vincent

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Yury Sidorov
From: Daniël Mantione [EMAIL PROTECTED] Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too.

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Daniël Mantione
Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Florian Klaempfl
Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instead of testing for arm cpu, you could use

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Daniël Mantione
Op Tue, 26 Feb 2008, schreef Florian Klaempfl: Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices.

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Bernd Mueller
Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Florian Klaempfl: Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Micha Nelissen
Bernd Mueller wrote: the main affected routines are unpack and decode. Both routines were called for every single character (only for a stroked font) via OutTextXYDefault. So speed is not unimportant ;-) Perhaps you can separate I/O and processing? Read into unpacked structure and process