Bernd Mueller wrote:
(very unexpected) result of this benchmark is, that a version with
leaving the TStroke-Record packed, is about 13 % faster than the
original patch. I am going to send a new patch soon.
unfortunately this one is about 10 % slower on X86. So, I am going to
leave this to
Daniël Mantione schrieb:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead unaligned will simulate an unaligned load with two loads
and some rotation etc. On the ARM, where every mnemonic can rotate
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it
looks quite different (and I'm not sure about all the types used here,
some seem to be sets, some enumerations).
Can be configured:
http://lists.freepascal.org/docs-html/prog/progsu50.html
Delphi has the minenumsize
The VirtualTreeView tries to make the fields of the (packed) record
aligned at dword boundary by grouping together smaller (one or two
byte fields) or adding dummy fields. Does this trick overrides the
unaligned memory access?
Of course it is always a good idea to sort the members of a record
On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote:
One more question:
The VirtualTreeView tries to make the fields of the (packed) record
aligned at dword boundary by grouping together smaller (one or two
byte fields) or adding dummy fields. Does this trick overrides the
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Memory access. What happens is that the non-packed version causes
more cache misses. A cache miss costs many cycles on a modern cpu, a
misaligned read just costs an extra memory access (which is fast if
cached) on x86,
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Memory access. What happens is that the non-packed version causes more
cache misses. A cache miss costs many cycles on a modern cpu, a misaligned
read just costs an extra memory access (which is fast if cached) on x86,
and extra load
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead unaligned will simulate an unaligned load with two loads and some
rotation etc. On the ARM, where every mnemonic can rotate operands, this is
isn't that bad of a penalty.
Therefore, I wouldn't be surprised that even on ARM, arrays
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead unaligned will simulate an unaligned load with two loads and
some rotation etc. On the ARM, where every mnemonic can rotate operands,
this is isn't that bad of
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead unaligned will simulate an unaligned load with two loads
and some rotation etc. On the ARM, where every mnemonic can rotate
operands, this is isn't that bad of a penalty.
Therefore, I wouldn't be surprised that
From: Daniël Mantione [EMAIL PROTECTED]
Instead unaligned will simulate an unaligned load with two loads
and some
rotation etc. On the ARM, where every mnemonic can rotate
operands, this is
isn't that bad of a penalty.
Therefore, I wouldn't be surprised that even on ARM, arrays with
packed
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead unaligned will simulate an unaligned load with two loads
and some rotation etc. On the ARM, where every mnemonic can rotate
operands,
Vinzent Hoefler wrote:
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it
looks quite different (and I'm not sure about all the types used here,
some seem to be sets, some enumerations). But at the first glance it
seems, they used both packed records to either ensure
Jonas Maebe wrote:
On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote:
One more question:
The VirtualTreeView tries to make the fields of the (packed) record
aligned at dword boundary by grouping together smaller (one or two
byte fields) or adding dummy fields. Does this trick
On 01 Mar 2008, at 02:00, Luiz Americo Pereira Camara wrote:
The question is: using the layout below with packed (i can force the
set size to be equal to Delphi) i still have unaligned memory access?
As long as you record is declared as packed, all memory accesses are
handled as if they
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records faster?
Cache
On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some
speed.
I'd like to understand more this issue.
Why are non packed records faster?
Luiz Americo Pereira Camara wrote:
Why are non packed records faster?
The difference occurs at memory allocation or at memory access?
In addition to what the others said, think of it like your 32 bit
processor suddenly being a 8 bit processor: it has to manually load 4
times 8 bit, arrange
On x86 processors it's usually only a speed penalty (or has anyone ever
seen the AC flag turned on?), on other processors you may even have to
workaround exceptions (i.e. bus errors), because the processor simply
refuses to read or write unaligned data.
It even is not guaranteed (or even
Michael Schnell wrote:
If it accesses a misaligned 32 bit value it does two accesses (not 4):
e.g. once 8 bit and once 24 bit (when reading each of the accesses is
the same 32 bit, anyway).
Logically you should think about it how I explained. That Intel did an
optimization to make the speed
internally the processor still has to have separate 8 bit data paths
and do shifting to reorder the bytes.
This is a barrel shifter in the data path that is integrated in the
queue and does not take an additional execution cycle.
-Michael
___
Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
Memory access. What happens is that the non-packed version causes
more cache misses.
Please elaborate. If the (unaligned) data is crossing a cache-line, thus
causing two full
On Thursday 28 February 2008 11:25, Daniël Mantione wrote:
Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
Memory access. What happens is that the non-packed version causes
more cache misses.
OMG. I'm s confused. ;) I read that
From: Daniël Mantione [EMAIL PROTECTED]
On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
Memory access. What happens is that the non-packed version causes
more cache misses.
Please elaborate. If the (unaligned) data is crossing a
cache-line, thus
causing two full cache-line
Op Thu, 28 Feb 2008, schreef Yury Sidorov:
Yes, but if you have an array of them (as we have in this case),
considerably more of these records will fit in the cache. Therefore you
will have considerably less cache misses. This becomes even more serious
when the processor in question does not
Op Thu, 28 Feb 2008, schreef Michael Schnell:
An ARM does not have such logic and will suffer cache miss after cache
miss.
Nonetheless the count of word transfers form memory to/from the cache would
be smaller with packed records which might result in a lot faster execution
(of course
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records
Luiz Americo Pereira Camara wrote:
TVirtualNodePacked = packed record
Index,//Offset 0 ChildCount: Cardinal; //Offset 4
NodeHeight: Word; //Offset 8
States: TVirtualNodeStates; //Offset 10 *
Align: Byte; //Offset 14 ** CheckState: TCheckState;
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it
looks quite different (and I'm not sure about all the types used here,
some seem to be sets, some enumerations). But at the first glance it
seems, they used both packed records to either ensure minimum size or
known record
Vincent Snijders wrote:
Instead of testing for arm cpu, you could use
FPC_REQUIRES_PROPER_ALIGNMENT too. So it is fixed for sparc as well.
yes, the changed patch is attached.
Regards, Bernd.
Index: packages/graph/src/inc/gtext.inc
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-Linux devices.
Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too. So
it is fixed for sparc as well.
Vincent
From: Daniël Mantione [EMAIL PROTECTED]
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors),
during font
rendering (with the graph unit) on Arm-Linux devices.
Instead of testing for arm cpu, you could use
FPC_REQUIRES_PROPER_ALIGNMENT
too.
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-Linux devices.
Regards, Bernd.
Index: packages/graph/src/inc/gtext.inc
===
---
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during font
rendering (with the graph unit) on Arm-Linux devices.
Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-Linux devices.
Instead of testing for arm cpu, you could use
Op Tue, 26 Feb 2008, schreef Florian Klaempfl:
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-Linux devices.
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Florian Klaempfl:
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on
Bernd Mueller wrote:
the main affected routines are unpack and decode. Both routines were
called for every single character (only for a stroked font) via
OutTextXYDefault. So speed is not unimportant ;-)
Perhaps you can separate I/O and processing? Read into unpacked
structure and process
39 matches
Mail list logo