Bernd Mueller wrote:
(very unexpected) result of this benchmark is, that a version with
leaving the TStroke-Record packed, is about 13 % faster than the
original patch. I am going to send a new patch soon.
unfortunately this one is about 10 % slower on X86. So, I am going to
leave this to the
Daniël Mantione schrieb:
>
>
> Op Fri, 29 Feb 2008, schreef Christian Iversen:
>
>> Daniël Mantione wrote:
>>>
>>>
>>> Op Fri, 29 Feb 2008, schreef Christian Iversen:
>>>
> Instead "unaligned" will simulate an unaligned load with two loads
> and some rotation etc. On the ARM, where every
On 01 Mar 2008, at 02:00, Luiz Americo Pereira Camara wrote:
The question is: using the layout below with packed (i can force the
set size to be equal to Delphi) i still have unaligned memory access?
As long as you record is declared as "packed", all memory accesses are
handled as if they
Jonas Maebe wrote:
On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote:
One more question:
The VirtualTreeView tries to make the fields of the (packed) record
aligned at dword boundary by grouping together smaller (one or two
byte fields) or adding dummy fields. Does this trick over
Vinzent Hoefler wrote:
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it
looks quite different (and I'm not sure about all the types used here,
some seem to be sets, some enumerations). But at the first glance it
seems, they used both packed records to either ensure minimum
Daniël Mantione wrote:
>
>
> Op Fri, 29 Feb 2008, schreef Christian Iversen:
>
>> Daniël Mantione wrote:
>>>
>>>
>>> Op Fri, 29 Feb 2008, schreef Christian Iversen:
>>>
> Instead "unaligned" will simulate an unaligned load with two loads
> and some rotation etc. On the ARM, where every m
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead "unaligned" will simulate an unaligned load with two loads and
some rotation etc. On the ARM, where every mnemonic can rotate operands,
this is isn't that bad of
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead "unaligned" will simulate an unaligned load with two loads
and some rotation etc. On the ARM, where every mnemonic can rotate
operands, this is isn't that bad of a penalty.
Therefore, I wouldn't be surprised tha
From: "Daniël Mantione" <[EMAIL PROTECTED]>
Instead "unaligned" will simulate an unaligned load with two loads
and some
rotation etc. On the ARM, where every mnemonic can rotate
operands, this is
isn't that bad of a penalty.
Therefore, I wouldn't be surprised that even on ARM, arrays with
pac
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead "unaligned" will simulate an unaligned load with two loads and some
rotation etc. On the ARM, where every mnemonic can rotate operands, this is
isn't that bad of a penalty.
Therefore, I wouldn't be surprised that even on ARM, arrays wi
Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Memory access. What happens is that the non-packed version causes
more cache misses. A cache miss costs many cycles on a modern cpu, a
misaligned read just costs an extra memory access (which is fast if
cached) on x86, a
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Memory access. What happens is that the non-packed version causes more
cache misses. A cache miss costs many cycles on a modern cpu, a misaligned
read just costs an extra memory access (which is fast if cached) on x86,
and extra load instructio
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records
On 29 Feb 2008, at 01:55, Luiz Americo Pereira Camara wrote:
One more question:
The VirtualTreeView tries to make the fields of the (packed) record
aligned at dword boundary by grouping together smaller (one or two
byte fields) or adding dummy fields. Does this trick overrides the
unalig
The VirtualTreeView tries to make the fields of the (packed) record
aligned at dword boundary by grouping together smaller (one or two
byte fields) or adding dummy fields. Does this trick overrides the
unaligned memory access?
Of course it is always a good idea to sort the members of a record
> Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it
> looks quite different (and I'm not sure about all the types used here,
> some seem to be sets, some enumerations).
Can be configured:
http://lists.freepascal.org/docs-html/prog/progsu50.html
Delphi has the minenumsize o
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it
looks quite different (and I'm not sure about all the types used here,
some seem to be sets, some enumerations). But at the first glance it
seems, they used both packed records to either ensure minimum size or
known record l
Luiz Americo Pereira Camara wrote:
TVirtualNodePacked = packed record
Index,//Offset 0 ChildCount: Cardinal; //Offset 4
NodeHeight: Word; //Offset 8
States: TVirtualNodeStates; //Offset 10 *
Align: Byte; //Offset 14 ** CheckState: TCheckState;
//Offset
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records
Op Thu, 28 Feb 2008, schreef Michael Schnell:
An ARM does not have such logic and will suffer cache miss after cache
miss.
Nonetheless the count of word transfers form memory to/from the cache would
be smaller with packed records which might result in a lot faster execution
(of course de
An ARM does not have such logic and will suffer cache miss after cache
miss.
Nonetheless the count of word transfers form memory to/from the cache
would be smaller with packed records which might result in a lot faster
execution (of course depending on the layout of the record, speed of the
Op Thu, 28 Feb 2008, schreef Yury Sidorov:
Yes, but if you have an array of them (as we have in this case),
considerably more of these records will fit in the cache. Therefore you
will have considerably less cache misses. This becomes even more serious
when the processor in question does not h
From: "Daniël Mantione" <[EMAIL PROTECTED]>
> On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
>
>> Memory access. What happens is that the non-packed version causes
>> more cache misses.
>
> Please elaborate. If the (unaligned) data is crossing a
> cache-line, thus
> causing two full
On Thursday 28 February 2008 11:25, Daniël Mantione wrote:
> Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
> > On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
> >> Memory access. What happens is that the non-packed version causes
> >> more cache misses.
> >
OMG. I'm s confused. ;) I
Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
Memory access. What happens is that the non-packed version causes
more cache misses.
Please elaborate. If the (unaligned) data is crossing a cache-line, thus
causing two full cache-line
internally the processor still has to have separate "8 bit" data paths
and do shifting to reorder the bytes.
This is a barrel shifter in the data path that is integrated in the
queue and does not take an additional execution cycle.
-Michael
___
fpc-
Michael Schnell wrote:
If it accesses a misaligned 32 bit value it does two accesses (not 4):
e.g. once 8 bit and once 24 bit (when reading each of the accesses is
the same 32 bit, anyway).
Logically you should think about it how I explained. That Intel did an
optimization to make the speed i
Micha Nelissen wrote:
In addition to what the others said, think of it like your 32 bit
processor suddenly being a 8 bit processor: it has to manually load 4
times 8 bit, arrange them into a 32 bit value, and only then use it.
With non packed, it can use the value directly.
With an x86 no addit
On x86 processors it's usually only a speed penalty (or has anyone ever
seen the AC flag turned on?), on other processors you may even have to
workaround exceptions (i.e. bus errors), because the processor simply
refuses to read or write unaligned data.
It even is not guaranteed (or even comm
Luiz Americo Pereira Camara wrote:
Why are non packed records faster?
The difference occurs at memory allocation or at memory access?
In addition to what the others said, think of it like your 32 bit
processor suddenly being a 8 bit processor: it has to manually load 4
times 8 bit, arrange th
On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
> Memory access. What happens is that the non-packed version causes
> more cache misses.
Please elaborate. If the (unaligned) data is crossing a cache-line, thus
causing two full cache-line reads, I'd understand that, but once it's
in t
On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote:
> Yury Sidorov wrote:
> > The patch removes packed record for some platforms.
> > IMO packed can be removed for all platforms. It will gain some
> > speed.
>
> I'd like to understand more this issue.
> Why are non packed records
Why are non packed records faster?
Cache trashing. One of the most underestimated performance killers in
modern software.
smaller (packed) records will need less cache space and thus should
be faster regarding the memory interface.
-Michael
Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records faster?
Cache trashing
Yury Sidorov wrote:
The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.
I'd like to understand more this issue.
Why are non packed records faster?
The difference occurs at memory allocation or at memory access?
Original (Del
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Bernd Mueller:
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Florian Klaempfl:
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data
Op Tue, 26 Feb 2008, schreef Bernd Mueller:
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Florian Klaempfl:
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), dur
Bernd Mueller wrote:
the main affected routines are unpack and decode. Both routines were
called for every single character (only for a stroked font) via
OutTextXYDefault. So speed is not unimportant ;-)
Perhaps you can separate I/O and processing? Read into "unpacked"
structure and process f
Daniël Mantione wrote:
Op Tue, 26 Feb 2008, schreef Florian Klaempfl:
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-
Op Tue, 26 Feb 2008, schreef Florian Klaempfl:
Daniël Mantione schrieb:
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-Linux devices.
Instea
Daniël Mantione schrieb:
>
>
> Op Tue, 26 Feb 2008, schreef Vincent Snijders:
>
>> Bernd Mueller schreef:
>>> Hello,
>>>
>>> the attached patch avoids misaligned data access (bus errors), during
>>> font rendering (with the graph unit) on Arm-Linux devices.
>>>
>>
>> Instead of testing for arm c
From: "Daniël Mantione" <[EMAIL PROTECTED]>
> Bernd Mueller schreef:
>> Hello,
>>
>> the attached patch avoids misaligned data access (bus errors),
>> during font
>> rendering (with the graph unit) on Arm-Linux devices.
>>
>
> Instead of testing for arm cpu, you could use
> FPC_REQUIRES_PROPER
Vincent Snijders wrote:
Instead of testing for arm cpu, you could use
FPC_REQUIRES_PROPER_ALIGNMENT too. So it is fixed for sparc as well.
yes, the changed patch is attached.
Regards, Bernd.
Index: packages/graph/src/inc/gtext.inc
Op Tue, 26 Feb 2008, schreef Vincent Snijders:
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during font
rendering (with the graph unit) on Arm-Linux devices.
Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT
too.
Bernd Mueller schreef:
Hello,
the attached patch avoids misaligned data access (bus errors), during
font rendering (with the graph unit) on Arm-Linux devices.
Instead of testing for arm cpu, you could use FPC_REQUIRES_PROPER_ALIGNMENT too. So
it is fixed for sparc as well.
Vincent
__
45 matches
Mail list logo