subject:"Re\: \[fpc\-devel\] Patch, font rendering on Arm\-Linux devices."

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-03-03 Thread Bernd Mueller

Bernd Mueller wrote: (very unexpected) result of this benchmark is, that a version with leaving the TStroke-Record packed, is about 13 % faster than the original patch. I am going to send a new patch soon. unfortunately this one is about 10 % slower on X86. So, I am going to leave this to the

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-03-02 Thread Florian Klaempfl

Daniël Mantione schrieb: > > > Op Fri, 29 Feb 2008, schreef Christian Iversen: > >> Daniël Mantione wrote: >>> >>> >>> Op Fri, 29 Feb 2008, schreef Christian Iversen: >>> > Instead "unaligned" will simulate an unaligned load with two loads > and some rotation etc. On the ARM, where every

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Jonas Maebe

On 01 Mar 2008, at 02:00, Luiz Americo Pereira Camara wrote: The question is: using the layout below with packed (i can force the set size to be equal to Delphi) i still have unaligned memory access? As long as you record is declared as "packed", all memory accesses are handled as if they

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Luiz Americo Pereira Camara

Jonas Maebe wrote: On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote: One more question: The VirtualTreeView tries to make the fields of the (packed) record aligned at dword boundary by grouping together smaller (one or two byte fields) or adding dummy fields. Does this trick over

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Luiz Americo Pereira Camara

Vinzent Hoefler wrote: Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it looks quite different (and I'm not sure about all the types used here, some seem to be sets, some enumerations). But at the first glance it seems, they used both packed records to either ensure minimum

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen

Daniël Mantione wrote: > > > Op Fri, 29 Feb 2008, schreef Christian Iversen: > >> Daniël Mantione wrote: >>> >>> >>> Op Fri, 29 Feb 2008, schreef Christian Iversen: >>> > Instead "unaligned" will simulate an unaligned load with two loads > and some rotation etc. On the ARM, where every m

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Daniël Mantione

Op Fri, 29 Feb 2008, schreef Christian Iversen: Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead "unaligned" will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen

Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead "unaligned" will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of a penalty. Therefore, I wouldn't be surprised tha

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Yury Sidorov

From: "Daniël Mantione" <[EMAIL PROTECTED]> Instead "unaligned" will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of a penalty. Therefore, I wouldn't be surprised that even on ARM, arrays with pac

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Daniël Mantione

Op Fri, 29 Feb 2008, schreef Christian Iversen: Instead "unaligned" will simulate an unaligned load with two loads and some rotation etc. On the ARM, where every mnemonic can rotate operands, this is isn't that bad of a penalty. Therefore, I wouldn't be surprised that even on ARM, arrays wi

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen

Daniël Mantione wrote: Op Fri, 29 Feb 2008, schreef Christian Iversen: Memory access. What happens is that the non-packed version causes more cache misses. A cache miss costs many cycles on a modern cpu, a misaligned read just costs an extra memory access (which is fast if cached) on x86, a

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Daniël Mantione

Op Fri, 29 Feb 2008, schreef Christian Iversen: Memory access. What happens is that the non-packed version causes more cache misses. A cache miss costs many cycles on a modern cpu, a misaligned read just costs an extra memory access (which is fast if cached) on x86, and extra load instructio

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Christian Iversen

Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Jonas Maebe

On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote: One more question: The VirtualTreeView tries to make the fields of the (packed) record aligned at dword boundary by grouping together smaller (one or two byte fields) or adding dummy fields. Does this trick overrides the unalig

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Michael Schnell

The VirtualTreeView tries to make the fields of the (packed) record aligned at dword boundary by grouping together smaller (one or two byte fields) or adding dummy fields. Does this trick overrides the unaligned memory access? Of course it is always a good idea to sort the members of a record

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-29 Thread Marco van de Voort

> Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it > looks quite different (and I'm not sure about all the types used here, > some seem to be sets, some enumerations). Can be configured: http://lists.freepascal.org/docs-html/prog/progsu50.html Delphi has the minenumsize o

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler

Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it looks quite different (and I'm not sure about all the types used here, some seem to be sets, some enumerations). But at the first glance it seems, they used both packed records to either ensure minimum size or known record l

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara

Luiz Americo Pereira Camara wrote: TVirtualNodePacked = packed record Index,//Offset 0 ChildCount: Cardinal; //Offset 4 NodeHeight: Word; //Offset 8 States: TVirtualNodeStates; //Offset 10 * Align: Byte; //Offset 14 ** CheckState: TCheckState; //Offset

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara

Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione

Op Thu, 28 Feb 2008, schreef Michael Schnell: An ARM does not have such logic and will suffer cache miss after cache miss. Nonetheless the count of word transfers form memory to/from the cache would be smaller with packed records which might result in a lot faster execution (of course de

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell

An ARM does not have such logic and will suffer cache miss after cache miss. Nonetheless the count of word transfers form memory to/from the cache would be smaller with packed records which might result in a lot faster execution (of course depending on the layout of the record, speed of the

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione

Op Thu, 28 Feb 2008, schreef Yury Sidorov: Yes, but if you have an array of them (as we have in this case), considerably more of these records will fit in the cache. Therefore you will have considerably less cache misses. This becomes even more serious when the processor in question does not h

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Yury Sidorov

From: "Daniël Mantione" <[EMAIL PROTECTED]> > On Thursday 28 February 2008 09:16, Daniël Mantione wrote: > >> Memory access. What happens is that the non-packed version causes >> more cache misses. > > Please elaborate. If the (unaligned) data is crossing a > cache-line, thus > causing two full

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler

On Thursday 28 February 2008 11:25, Daniël Mantione wrote: > Op Thu, 28 Feb 2008, schreef Vinzent Hoefler: > > On Thursday 28 February 2008 09:16, Daniël Mantione wrote: > >> Memory access. What happens is that the non-packed version causes > >> more cache misses. > > OMG. I'm s confused. ;) I

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione

Op Thu, 28 Feb 2008, schreef Vinzent Hoefler: On Thursday 28 February 2008 09:16, Daniël Mantione wrote: Memory access. What happens is that the non-packed version causes more cache misses. Please elaborate. If the (unaligned) data is crossing a cache-line, thus causing two full cache-line

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell

internally the processor still has to have separate "8 bit" data paths and do shifting to reorder the bytes. This is a barrel shifter in the data path that is integrated in the queue and does not take an additional execution cycle. -Michael ___ fpc-

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Micha Nelissen

Michael Schnell wrote: If it accesses a misaligned 32 bit value it does two accesses (not 4): e.g. once 8 bit and once 24 bit (when reading each of the accesses is the same 32 bit, anyway). Logically you should think about it how I explained. That Intel did an optimization to make the speed i

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell

Micha Nelissen wrote: In addition to what the others said, think of it like your 32 bit processor suddenly being a 8 bit processor: it has to manually load 4 times 8 bit, arrange them into a 32 bit value, and only then use it. With non packed, it can use the value directly. With an x86 no addit

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell

On x86 processors it's usually only a speed penalty (or has anyone ever seen the AC flag turned on?), on other processors you may even have to workaround exceptions (i.e. bus errors), because the processor simply refuses to read or write unaligned data. It even is not guaranteed (or even comm

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Micha Nelissen

Luiz Americo Pereira Camara wrote: Why are non packed records faster? The difference occurs at memory allocation or at memory access? In addition to what the others said, think of it like your 32 bit processor suddenly being a 8 bit processor: it has to manually load 4 times 8 bit, arrange th

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler

On Thursday 28 February 2008 09:16, Daniël Mantione wrote: > Memory access. What happens is that the non-packed version causes > more cache misses. Please elaborate. If the (unaligned) data is crossing a cache-line, thus causing two full cache-line reads, I'd understand that, but once it's in t

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler

On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote: > Yury Sidorov wrote: > > The patch removes packed record for some platforms. > > IMO packed can be removed for all platforms. It will gain some > > speed. > > I'd like to understand more this issue. > Why are non packed records

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell

Why are non packed records faster? Cache trashing. One of the most underestimated performance killers in modern software. smaller (packed) records will need less cache space and thus should be faster regarding the memory interface. -Michael

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione

Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara: Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records faster? Cache trashing

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara

Yury Sidorov wrote: The patch removes packed record for some platforms. IMO packed can be removed for all platforms. It will gain some speed. I'd like to understand more this issue. Why are non packed records faster? The difference occurs at memory allocation or at memory access? Original (Del

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Bernd Mueller

Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Bernd Mueller: Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Florian Klaempfl: Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Daniël Mantione

Op Tue, 26 Feb 2008, schreef Bernd Mueller: Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Florian Klaempfl: Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), dur

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Micha Nelissen

Bernd Mueller wrote: the main affected routines are unpack and decode. Both routines were called for every single character (only for a stroked font) via OutTextXYDefault. So speed is not unimportant ;-) Perhaps you can separate I/O and processing? Read into "unpacked" structure and process f

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Bernd Mueller

Daniël Mantione wrote: Op Tue, 26 Feb 2008, schreef Florian Klaempfl: Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Daniël Mantione

Op Tue, 26 Feb 2008, schreef Florian Klaempfl: Daniël Mantione schrieb: Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instea

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Florian Klaempfl

Daniël Mantione schrieb: > > > Op Tue, 26 Feb 2008, schreef Vincent Snijders: > >> Bernd Mueller schreef: >>> Hello, >>> >>> the attached patch avoids misaligned data access (bus errors), during >>> font rendering (with the graph unit) on Arm-Linux devices. >>> >> >> Instead of testing for arm c

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Yury Sidorov

From: "Daniël Mantione" <[EMAIL PROTECTED]> > Bernd Mueller schreef: >> Hello, >> >> the attached patch avoids misaligned data access (bus errors), >> during font >> rendering (with the graph unit) on Arm-Linux devices. >> > > Instead of testing for arm cpu, you could use > FPC_REQUIRES_PROPER

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Bernd Mueller

Vincent Snijders wrote: Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too. So it is fixed for sparc as well. yes, the changed patch is attached. Regards, Bernd. Index: packages/graph/src/inc/gtext.inc

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Daniël Mantione

Op Tue, 26 Feb 2008, schreef Vincent Snijders: Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too.

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-26 Thread Vincent Snijders

Bernd Mueller schreef: Hello, the attached patch avoids misaligned data access (bus errors), during font rendering (with the graph unit) on Arm-Linux devices. Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too. So it is fixed for sparc as well. Vincent __

45 matches

Mail list logo