[fpc-devel] Unicode resourcestrings

2008-02-28 Thread Martin Schreiber
Hi,
Is there a way in current FPC to have unicode or wide resourcestrings?
Thanks,

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
Are enumeration types 1 or 4 bytes in Delphi? If they are one byte, it 
looks quite different (and I'm not sure about all the types used here, 
some seem to be sets, some enumerations). But at the first glance it 
seems, they used both packed records to either ensure minimum size or 
known record layout (maybe they even used the structure in some 
assembly module?), and also aligned them manually to avoid unaligned 
access issues.


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara

Luiz Americo Pereira Camara wrote:


TVirtualNodePacked = packed record
   Index,//Offset 0   ChildCount: Cardinal; //Offset 4
   NodeHeight: Word;  //Offset 8
   States: TVirtualNodeStates;  //Offset 10 *
   Align: Byte;  //Offset 14 **   CheckState: TCheckState; 
//Offset 15 **

   CheckType: TCheckType; //Offset 16
   Dummy: Byte;  //Offset 17TotalCount: Cardinal; //Offset 
18 *

  [...]



TVirtualNodePacked = packed record
  Index,//Offset 0 
 ChildCount: Cardinal; //Offset 4

  NodeHeight: Word;  //Offset 8
  States: TVirtualNodeStates;  //Offset 10 *
  Align: Byte;  //Offset 14 **
  CheckState: TCheckState; //Offset 15 **

  CheckType: TCheckType; //Offset 16
  Dummy: Byte;  //Offset 17 
  TotalCount: Cardinal; //Offset 18 *

 [...]


The mail editor scrambled the record structure. I hope this time is more 
clear.


Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara

Daniël Mantione wrote:



Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:


Yury Sidorov wrote:

The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.


I'd like to understand more this issue.
Why are non packed records faster?


Cache trashing. One of the most underestimated performance killers in 
modern software.



The difference occurs at memory allocation or at memory access?


Memory access. What happens is that the non-packed version causes more 
cache misses. A cache miss costs many cycles on a modern cpu, a 
misaligned read just costs an extra memory access (which is fast if 
cached) on x86, and extra load instruction on ARM. This much cheaper 
than a chache miss.


Thanks for all explanation. I'm sure that the change is worth.

One more question:

The VirtualTreeView tries to make the fields of the (packed) record 
aligned at dword boundary by grouping together smaller (one or two byte 
fields) or adding dummy fields. Does this trick overrides the unaligned 
memory access?


The real beast:

TVirtualNodePacked = packed record
   Index,//Offset 0
   ChildCount: Cardinal; //Offset 4

   NodeHeight: Word;  //Offset 8
   States: TVirtualNodeStates;  //Offset 10 *
   Align: Byte;  //Offset 14 **
   CheckState: TCheckState; //Offset 15 **

   CheckType: TCheckType; //Offset 16
   Dummy: Byte;  //Offset 17 
   TotalCount: Cardinal; //Offset 18 *

  [...]

For what i understand, the fields marked with * makes an unaligned 
access because they are not in dword boundary. Right?
Fields with ** also are not dword boundary aligned, but since are one 
byte fields there's not unaligned access. Right?


And about 64bit systems. Should the fields be qword aligned or dword is 
still sufficient?


Luiz

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Lazarus: A new widgest set

2008-02-28 Thread Martin Schreiber
On Tuesday 19 February 2008 16.55:16 Martin Schreiber wrote:
> On Tuesday 19 February 2008 15.53:16 Michael Schnell wrote:
> > > If you compile the SVN trunk version with -dmse_with_ifi you will get
> > > the MSEifi components in the component palette.
> >
> > Of course I really would like to help beta-testing this. Unfortunately,
> > due to a firewall jail I am working in, I can't access an SVN.
>
> You can't use opensource projects without SVN access, you must solve the
> problem.
>
> > Have I been correct assuming that I can do a "secondary" GUI using
> > MSE(-ifi), i.e. taking a normal (existing) Delphi or Lazarus program
> > that does feature it's normal GUI and add some MSE code (and widget
> > definitions) plus a transport channel and then I can create controls
> > that are visible on the screen of the remote machine. Moreover when the
> > remote user "clicks" a control that had bee defined in that way, an
> > event should be triggered (in a thread  > enable event driven programming in a thread> or in the main thread).
>
> Correct. I never tried a Delphi or Lazarus applications as server, I use
> MSEgui or MSEnogui applications.
>
> > Have I been correct assuming that either a Pascal program or a browser
> > plugin (is that Java code ?) can be used as a target of the transport
> > channel, and both should show a user interface that had been defined by
> > the master program ?
>
> Correct. The browser plugin doesn't exist up to now. I think it will be a
> Pascal dll/so.
>
> > It would be great if you could send me an example (at best a windows
> > exe, using http) and the browser plugin, so that I can see what MSE can
> > do.
>
> I'll see what I can do but not in the next days.
>
I made a demo of MSEifi with a server and a client connected by pipes.
Win32 binaries:
http://msedocumenting.svn.sourceforge.net/viewvc/msedocumenting/mse/trunk/help/tutorials/mseifi/ifipipedemo/bin/i386-win32/ifipipedemoclient.exe?view=log

and
http://msedocumenting.svn.sourceforge.net/viewvc/msedocumenting/mse/trunk/help/tutorials/mseifi/ifipipedemo/bin/i386-win32/ifipipedemoserver.exe?view=log

Download ifipipedemoclient.exe and ifipipedemoserver.exe into , 
cd , run ifipipedemoclient.exe, click 'connect'.

Screenshot:
http://www.homepage.bluewin.ch/msegui/pics/mseifidemo.png

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 13:09, Michael Schnell wrote:
> > Yes. That's what {$BIT_ORDER} would stand for (still, it would not
> > change *byte* order).
>
> I don't understand this. I don't think the bit order within a byte is
> to be considered changing.

Well, the question is, if the first bit in a record is the leftmost or 
the rightmost bit.

It's a matter of interpretation. But as Jonas pointed out, the order of 
the bits may change depending on the endianess (assuming I didn't 
misunderstand him).

> I would call the issue "byte-order" and (thus I'd prefer something
> like {$BIT_PACKED_BYTE_ORDER} or {$BIT_PACKED_ENDIAN}.

It's not byte order.

If I declare:

|bitpacked record
|   X : Byte;
|   Y : Byte;
|end record;

X will still be at the lowest address and Y will be at @X + 1. The issue 
arises when I say:

|bitpacked record
|   X : Boolean;
|   Y : Boolean;
|   Z : Two_Bit_Enum;
|end;

Assuming, bit 0 is the LSB, does the compiler access bit 0 and 1 (low 
order first) for X and Y or does it choose bit 7 and 6 (high order 
first) then? And how would it interprete a specific value for Z? At 
least two interpretations are possible:

X:7, Y:6, Z[5:4]   or   X:0, Y:1, Z[3:2]

ASCII graphic:

  |X|Y|Z|Z|-|-|-|-|
  |-|-|-|-|Z|Z|Y|X|

Ok, I guess, the issue with the enum is none, because the LSB is still 
at the right place on the data bus, no matter how you look at it. So 
forget that. ;)


Of course, there are more nasty things like

|bitpacked record
|   X : Boolean;
|   Y : Byte;
|end;

where a single value would cross the byte boundary... *headscratch* I 
guess, there's a reason, why endianess issues are not automatically 
handled by the compiler. :D


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Michael Schnell:





An ARM does not have such logic and will suffer cache miss after cache 
miss.
Nonetheless the count of word transfers form memory to/from the cache would 
be smaller with packed records which might result in a lot faster execution 
(of course depending on the layout of the record, speed of the memory, speed 
of the processor, type of operations done with the records, ...)


That is exactly what I wanted to explain: even on ARM the lower amount of 
cache misses might pay for the (higher) cost of an unaligned load.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell




An ARM does not have such logic and will suffer cache miss after cache 
miss.
Nonetheless the count of word transfers form memory to/from the cache 
would be smaller with packed records which might result in a lot faster 
execution (of course depending on the layout of the record, speed of the 
memory, speed of the processor, type of operations done with the 
records, ...)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell


Yes. That's what {$BIT_ORDER} would stand for (still, it would not 
change *byte* order).
  
I don't understand this. I don't think the bit order within a byte is to 
be considered changing.


I would call the issue "byte-order" and (thus I'd prefer something like 
{$BIT_PACKED_BYTE_ORDER} or {$BIT_PACKED_ENDIAN}.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Yury Sidorov:


Yes, but if you have an array of them (as we have in this case),
considerably more of these records will fit in the cache. Therefore you
will have considerably less cache misses. This becomes even more serious
when the processor in question does not have prefetching; in such case,
traversing the array will cause cache miss after cache miss, a smaller
array will then have less of these misses.


You are right. Array of packed records is a bit more effective than array of 
non-packed records, at least on modern x86 CPUs.


I do some benchmarks and got on Core Duo:
2070ms - for non-packed
1910ms - for packed

But for CPUs which do not support misaligned data access - packed records are 
speed killers and need to be used as the last resort.


I not 100% sure about this. Your Core Duo has a array traverse detector 
which activates prefetching. An ARM does not have such logic and will 
suffer cache miss after cache miss.


However, it is for certain that a manual unaligned load is more expensive 
on ARM than a hardware unaligned load on x86.


Also if record is not element of large array it is better do declare it as 
non-packed for all CPUs.


Yes.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 12:17, Michael Schnell wrote:
> >   Enable_Mode   : Enable_Set; // bit 14 .. 15/leftmost bits
>
> With an x86 the "leftmost bits" will be in the "rightmost" (second)
> of the two bytes,
>
> with an 68K the "leftmost bits" will be in the "leftmost" (first) of
> the two bytes,

Yes, bad example, because we already crossed the byte boundary. The real 
question about leftmost and rightmost was on which "data line" each bit 
would appear.

(Usually it's called most significant bit, but as we're talking about 
hardware bits, not numbers, I wouldn't use that term. In this context, 
no bit is necessarily more significant than the other.)

> So the two can't communicate this record via files or via network.
>
> If you want to have them understand each other, you need to define
> the edianess of the record independently of that of the processor.

Yes. That's what {$BIT_ORDER} would stand for (still, it would not 
change *byte* order).

> Enumerated types don't help here.

They weren't meant to solve the issue, they were meant to help to 
understand the issue I was trying to point out.

The question was, how to interpret the enumeration values, if their bit 
order could/would differ from that of the record they're in.

Should they be put in as is or swapped accordingly?


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell



  Enable_Mode   : Enable_Set; // bit 14 .. 15/leftmost bits
  
With an x86 the "leftmost bits" will be in the "rightmost" (second) of 
the two bytes,


with an 68K the "leftmost bits" will be in the "leftmost" (first) of the 
two bytes,


So the two can't communicate this record via files or via network.

If you want to have them understand each other, you need to define the 
edianess of the record independently of that of the processor. 
Enumerated types don't help here.


-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 11:28, Michael Schnell wrote:
> > AFAICS, it would be useful for bitpacked records only, so it could
> > appear anywhere where a {PACKRECORDS} directive or similar can
> > appear currently.
>
> IMHO it would only be useful (allowed with, regarded by) bitpacked
> record, as any other data representation is supposed to be optimized
> for speed according to the processor architecture.

Hmm, not necessarily. I frequently use enumeration types to express the 
meaning of a set of hardware bits. So thinking about it, what if I'd 
use enumerations in a bitpacked record? Maybe like this:

-- 8< --
type
   // 2 bits.
   Enable_Set = (Dont_Care := 0,  // 00
 Disable   := 1,  // 01
 Enable:= 3); // 11

type
   Control =
   bitpacked record
  Continuous_Mode   : Boolean;// bit 0/rightmost bit
  Alternate_Compare : Boolean;// bit 1
  ...
  Enable_Mode   : Enable_Set; // bit 14 .. 15/leftmost bits
   end;
-- 8< --


I don't know if FPC can pack enumerations into a bitpacked record at 
all, but if it does, it might consider the bit order here, too.

Consider something like:

-- 8< --
var
   My_Set: Enable_Set;
   My_Record : Control;
   

My_Set := My_Record.Enable_Mode;
-- 8< --

How should the assignment be handled if the bit order for that bitpacked 
record is changed?


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Yury Sidorov

From: "Daniël Mantione" <[EMAIL PROTECTED]>

> On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
>
>> Memory access. What happens is that the non-packed version causes
>> more cache misses.
>
> Please elaborate. If the (unaligned) data is crossing a 
> cache-line, thus
> causing two full cache-line reads, I'd understand that, but once 
> it's

> in the cache, it wouldn't matter anymore?

Yes, but if you have an array of them (as we have in this case),
considerably more of these records will fit in the cache. Therefore 
you
will have considerably less cache misses. This becomes even more 
serious
when the processor in question does not have prefetching; in such 
case,
traversing the array will cause cache miss after cache miss, a 
smaller

array will then have less of these misses.


You are right. Array of packed records is a bit more effective than 
array of non-packed records, at least on modern x86 CPUs.


I do some benchmarks and got on Core Duo:
2070ms - for non-packed
1910ms - for packed

But for CPUs which do not support misaligned data access - packed 
records are speed killers and need to be used as the last resort.


Also if record is not element of large array it is better do declare 
it as non-packed for all CPUs.


Yury. 
___

fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Jonas Maebe


On 28 Feb 2008, at 11:17, Daniël Mantione wrote:


Op Thu, 28 Feb 2008, schreef Jonas Maebe:

It's not about Linux vs. Windows, it's about FPC 2.2.0 vs FPC  
3.4.0, coupled with the fact that bitpacked records as currently  
defined are not usable for defining a specific layout.


It compeletely normal that a record written by a a program written  
in FPC 2.2 can be read be FPC 3.4.


A regularly packed record, yes. "Non-packed" records: not at all. In  
fact, their layout changed in some circumstances in FPC 2.3.1 compared  
to earlier versions. As to bitpacked records:


If you design a feature, after a grace time, it should be kept  
backward compatible.


Not if they are described like this in the manual (ref.tex, line 1204):

***
Note that the internals of the bitpacking are opaque: they can change  
at any time in the future. What is more: the internal packing depends  
on the endianness of the platform for which the compilation is done,  
and no conversion between platforms is possible. This makes bitpacked  
structures unsuitable for storing on disk or transport over networks.  
The format is however the same as the one used by the GNU Pascal  
Compiler, and we aim to retain this compatibility in the future.

***

The same goes for the internal format of sets (which also changed a  
while ago), and should also go for the layout (as far as the part  
which is normally invisible to the programmer goes) and reference  
counting of ansistrings/interfaces etc.


I haven't heard the argument why bitpacked records should be exempt  
from this.


Because they were not designed/implemented with binary portability/ 
compatibility in mind, and doing so allows freedom to optimize them,  
make them compatible on any platform with the custom format there if  
any (e.g. how debuggers expect them to be laid out), etc.


If you don't have to fix something in concrete, it's always a good  
idea not to do so because it'll only come back later to haunt you.  
That does not mean you have to actively try to change every opaque  
structure in every release in order to break backwards compatibility,  
but it does give you the freedom to do so when it's useful.


As I said before: if you want to define something which has a  
predictable layout, you have to do so specifically and give the  
programmer the means to do so. If it is not clear what the compiler  
will/may do from just reading the declaration and active compiler  
directives, it's almost by definition improper to rely on any current  
implementation details.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 11:25, Daniël Mantione wrote:
> Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
> > On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
> >> Memory access. What happens is that the non-packed version causes
> >> more cache misses.
> >

OMG. I'm s confused. ;) I read "that the packed version causes more 
cache misses" here. That was the part where I didn't understand why.

> > Please elaborate. If the (unaligned) data is crossing a cache-line,
> > thus causing two full cache-line reads, I'd understand that, but
> > once it's in the cache, it wouldn't matter anymore?
>
> Yes, but if you have an array of them (as we have in this case),
> considerably more of these records will fit in the cache.

Yes, that's what I figured, so I'm on the same path as you here, it 
seems, but tracing back the discussion it read:

-- 8< --
> I'd like to understand more this issue.
> Why are non packed records faster?

Cache trashing. One of the most underestimated performance killers in 
modern software.

> The difference occurs at memory allocation or at memory access?

Memory access. What happens is that the non-packed version causes more 
cache misses.

-- 8< --

The first part tells me non-packed records are faster, but the second 
line tells me that the non-packed version also causes more cache 
misses, thus is slower. That got me confused, I think.

Of course, the net result only depends on the benchmark you're using. ;)


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell


AFAICS, it would be useful for bitpacked records only, so it could 
appear anywhere where a {PACKRECORDS} directive or similar can appear 
currently.
  
IMHO it would only be useful (allowed with, regarded by) bitpacked 
record, as any other data representation is supposed to be optimized for 
speed according to the processor architecture.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:


On Thursday 28 February 2008 09:16, Daniël Mantione wrote:


Memory access. What happens is that the non-packed version causes
more cache misses.


Please elaborate. If the (unaligned) data is crossing a cache-line, thus
causing two full cache-line reads, I'd understand that, but once it's
in the cache, it wouldn't matter anymore?


Yes, but if you have an array of them (as we have in this case), 
considerably more of these records will fit in the cache. Therefore you 
will have considerably less cache misses. This becomes even more serious 
when the processor in question does not have prefetching; in such case, 
traversing the array will cause cache miss after cache miss, a smaller 
array will then have less of these misses.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Jonas Maebe:



On 28 Feb 2008, at 08:19, Daniël Mantione wrote:

As long as the compiler is consistent between platforms, it is okay. 
Differences between little/big endian are acceptable because this is the 
only situation where we require the coder to manually intervene and write 
two code paths (usually a simple endian conversion). We don't force the 
coder to make different code paths between i.e. Linux/Windows, nor should 
we.


It's not about Linux vs. Windows, it's about FPC 2.2.0 vs FPC 3.4.0, coupled 
with the fact that bitpacked records as currently defined are not usable for 
defining a specific layout.


It compeletely normal that a record written by a a program written in FPC 
2.2 can be read be FPC 3.4. If you design a feature, after a grace time, 
it should be kept backward compatible. I haven't heard the argument why 
bitpacked records should be exempt from this.


Daniël
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Jonas Maebe


On 28 Feb 2008, at 08:19, Daniël Mantione wrote:

As long as the compiler is consistent between platforms, it is okay.  
Differences between little/big endian are acceptable because this is  
the only situation where we require the coder to manually intervene  
and write two code paths (usually a simple endian conversion). We  
don't force the coder to make different code paths between i.e.  
Linux/Windows, nor should we.


It's not about Linux vs. Windows, it's about FPC 2.2.0 vs FPC 3.4.0,  
coupled with the fact that bitpacked records as currently defined are  
not usable for defining a specific layout.


For that sort of functionality, you need extensions anyway. If someone  
wants that functionality, it's better to create such extensions so the  
programmer can in fact specify everything and implement that, rather  
than adding constraints to the implementation of bitpacked records.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 10:01, Michael Schnell wrote:
> > {$BITORDER LOW_ORDER_FIRST}
> > {$BITORDER HIGH_ORDER_FIRST}
>
> Where can this be used ? What exactly does it mean ?

Well, call it proposal (of course, the names are strongly influenced by 
personal language preferences).

AFAICS, it would be useful for bitpacked records only, so it could 
appear anywhere where a {PACKRECORDS} directive or similar can appear 
currently.


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell


internally the processor still has to have separate "8 bit" data paths 
and do shifting to reorder the bytes.
This is a barrel shifter in the data path that is integrated in the 
queue and does not take an additional execution cycle.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Micha Nelissen

Michael Schnell wrote:
If it accesses a misaligned 32 bit value it does two accesses (not 4): 
e.g. once 8 bit and once 24 bit (when reading each of the accesses is 
the same 32 bit, anyway).


Logically you should think about it how I explained. That Intel did an 
optimization to make the speed impact less is a different issue: 
internally the processor still has to have separate "8 bit" data paths 
and do shifting to reorder the bytes.


Perhaps this behaviour is specified in their optimization documents, or 
maybe you have the VHDL source? :-)


Transferring data from/to the 1st level cache imposes a lot more delay 
than the misaligned access. Thus if there are many instances of a record 
variable that are used for calculation, it might be much faster to use 
the packed version. If there are only a few, usually the unpacked 
version should be faster.


Show me the benchmark results ;-)

Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell

Micha Nelissen wrote:
In addition to what the others said, think of it like your 32 bit 
processor suddenly being a 8 bit processor: it has to manually load 4 
times 8 bit, arrange them into a 32 bit value, and only then use it. 
With non packed, it can use the value directly.
With an x86 no additional code needs to be created by the compiler, as 
it _can_ do misaligned accesses (there are other processors that can't 
and need more code).


If it accesses a misaligned 32 bit value it does two accesses (not 4): 
e.g. once 8 bit and once 24 bit (when reading each of the accesses is 
the same 32 bit, anyway).


But all this is only internal in the core of the chip and thus _very_ 
fast, as the chip contains a (1st level) cache and same is connected to 
the second level cache (also within the chip) with a 128 bit or more 
data path.


Transferring data from/to the 1st level cache imposes a lot more delay 
than the misaligned access. Thus if there are many instances of a record 
variable that are used for calculation, it might be much faster to use 
the packed version. If there are only a few, usually the unpacked 
version should be faster.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell



{$BITORDER LOW_ORDER_FIRST}
{$BITORDER HIGH_ORDER_FIRST}
  

Where can this be used ? What exactly does it mean ?

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell


On x86 processors it's usually only a speed penalty (or has anyone ever 
seen the AC flag turned on?), on other processors you may even have to 
workaround exceptions (i.e. bus errors), because the processor simply 
refuses to read or write unaligned data. 
It even is not guaranteed (or even common) that a misaligned access with 
a processor that only can do aligned memory actions can be cured by an 
exception.


That is why the compiler needs to create complex code for the 
potentially misaligned elements of a packed record. All C compilers do 
this and I am positive that FP does it, too. So no problem here (beyond 
the additional cycles needed when working with packed records).


A real problem comes up if you manipulate a pointer to a (supposedly 
aligned) multi-byte variable to make it point to an odd address. This 
will make the program crash on certain processors (not PC not "big" 
68Ks, but "small" 68 Ks.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 09:51, Micha Nelissen wrote:

> Well we have procedures to do byte swapping, but none to do bit
> swapping. It's also very inefficient AFAIK; while changing the
> compiler's definition of which bit to use is "free".

{$BITORDER LOW_ORDER_FIRST}
{$BITORDER HIGH_ORDER_FIRST}

?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Micha Nelissen

Luiz Americo Pereira Camara wrote:

Why are non packed records faster?
The difference occurs at memory allocation or at memory access?


In addition to what the others said, think of it like your 32 bit 
processor suddenly being a 8 bit processor: it has to manually load 4 
times 8 bit, arrange them into a 32 bit value, and only then use it. 
With non packed, it can use the value directly.


Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Micha Nelissen

Daniël Mantione wrote:


Op Thu, 28 Feb 2008, schreef Micha Nelissen:

Jonas said the bits were swapped, not the bytes. So PPC32 (1 shl 30) 
becomes (1 shl 6) on Intel (actual), while it should be (1 shl 1) 
(expected use by compiler). It's both the "second bit", bit it's in 
different places.


Okay, but does this have impact on the discussion? I mean it makes 
manual endian conversion a bit more tricky (also need to swap bits 
around), but doesn't change the fact that you manually need to do endian 
conversion.


Well we have procedures to do byte swapping, but none to do bit 
swapping. It's also very inefficient AFAIK; while changing the 
compiler's definition of which bit to use is "free".


Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
On Thursday 28 February 2008 09:16, Daniël Mantione wrote:

> Memory access. What happens is that the non-packed version causes
> more cache misses.

Please elaborate. If the (unaligned) data is crossing a cache-line, thus 
causing two full cache-line reads, I'd understand that, but once it's 
in the cache, it wouldn't matter anymore?

IOW: How can a packed (thus smaller) record cause more cache misses than 
a better aligned (but bigger) one? That it can in certain 
circumstances, I'd understand, but as a general rule?


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Vinzent Hoefler
On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote:
> Yury Sidorov wrote:
> > The patch removes packed record for some platforms.
> > IMO packed can be removed for all platforms. It will gain some
> > speed.
>
> I'd like to understand more this issue.
> Why are non packed records faster?
> The difference occurs at memory allocation or at memory access?

At memory access.

On x86 processors it's usually only a speed penalty (or has anyone ever 
seen the AC flag turned on?), on other processors you may even have to 
workaround exceptions (i.e. bus errors), because the processor simply 
refuses to read or write unaligned data. And then the only way to 
circumvent the processor's refusal is to read/write the data byte by 
byte or mask it out, which is slower than just reading or writing it.

Consider writing a 16-bit value spanning across 32-bit-values where the 
processor can only access a single 32 bits value at an aligned address:

*_ _ _ _*_ _ _ _
|0|1|2|3|4|5|6|7|
|___|

Now the data you need is spanning across bytes [2:5], but the processor 
can only read full 32 bits either at position 0 (reading bytes [0:3]), 
or position 4 (reading byte [4:7]). You'd need to read both processor 
words, mask the data in the lower and upper half of each and write back 
both words with the new data patched "inbetween" them.

So by now, no matter if the processor handles it for you or if the 
compiler would insert the necessary code to do it, even a simple 
increment is insanely expensive in terms of processor cycles.


Vinzent.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Michael Schnell



Why are non packed records faster?


Cache trashing. One of the most underestimated performance killers in 
modern software.
 smaller (packed) records will need less cache space and thus should 
be faster regarding the memory interface.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell




The only thing I want to guarantee is that blockwrite followed by 
blockread on a platform with the same endianness works and will work 
in the future.


This (combined with ifdef based endian conversion) guarantees 
portability of structures to any platform.


Or do I see this wrong?


IMHO, I do think this not enough.

ADAIK, there is an FP version for a high endian processor (68K); more 
can be crafted any time. It should be made easy to create communication 
systems independent of the architecture. Of course you can't have binary 
compatibility with all values by default (due to performance 
considerations), but if we do have a "bitpacked" type that is not 
optimized for speed but for structure, it should be possible to use it 
for that purpose.


Moreover communication via structures in a documented layout is very 
often needed (Network, files, hardware, ...). There should be an easy 
way to craft a record type according to such a documentation, may if be 
documented to hold it's multi-byte values in high or low endian 
representation. "bitpacked" did open this box of Pandora and it's an 
obvious request to go all the way :) .


(e.g.: TCP/IP defines high-endian, while PC's work low endian 
internally, so everything needs to be converted.)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Micha Nelissen:


Daniël Mantione wrote:
To my knowledge there is no problem with the current implementation. Endian 
conversion is already the reponsibility of the programmer. Therefore I 
don't see a need for changes on the compiler side.


Jonas said the bits were swapped, not the bytes. So PPC32 (1 shl 30) becomes 
(1 shl 6) on Intel (actual), while it should be (1 shl 1) (expected use by 
compiler). It's both the "second bit", bit it's in different places.


Okay, but does this have impact on the discussion? I mean it makes 
manual endian conversion a bit more tricky (also need to swap bits 
around), but doesn't change the fact that you manually need to do endian 
conversion.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Daniël Mantione



Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:


Yury Sidorov wrote:

The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.


I'd like to understand more this issue.
Why are non packed records faster?


Cache trashing. One of the most underestimated performance killers in 
modern software.



The difference occurs at memory allocation or at memory access?


Memory access. What happens is that the non-packed version causes more 
cache misses. A cache miss costs many cycles on a modern cpu, a misaligned 
read just costs an extra memory access (which is fast if cached) on x86, 
and extra load instruction on ARM. This much cheaper than a chache miss.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell


To my knowledge there is no problem with the current implementation. 
Endian conversion is already the reponsibility of the programmer. 
Therefore I don't see a need for changes on the compiler side.
It might be possible do define the individual bytes of a certain value 
in a bitpacked record to be located at certain bit positions by some 
complicated syntax, but if that is required for binary compatibility, 
that is not "nice" at all. A compiler option to optionally define the 
endianess is much more handy.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Michael Schnell:



To my knowledge there is no problem with the current implementation. Endian 
conversion is already the reponsibility of the programmer. Therefore I 
don't see a need for changes on the compiler side.



I don't understand your meaning here.

If a record with binary fields is defined and transferred to (e.g.) another 
instance of the same program running on another machine by network or within 
a file, or if it needs to be crafted according to a specification of a 
network block or file format or a hardware device, endianess needs to be 
taken care of, and it's not nice if the user needs to write active code to do 
this, if a "bitpacked" type is available that seemingly can be used for that 
issue.


You even might want to define a record that just holds a single 16 bit value. 
Here it would be good to be able do define endianess of the bitpacked 
structure to make it compatible with the communication partner.


Fixed endianness of records could be a language extension, it might even 
ease programmers from doing endian conversion (rather than doing ifdefs, 
just mention the endianness of your record), much nicer.


However, the situation as it is, is that we do not have such a language 
feature and rely on manual endian conversion by means of ifdefs. There is 
no difference between normal, packed, or bitpacked records, you have to 
endian convert them manually.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

2008-02-28 Thread Luiz Americo Pereira Camara

Yury Sidorov wrote:

The patch removes packed record for some platforms.
IMO packed can be removed for all platforms. It will gain some speed.


I'd like to understand more this issue.
Why are non packed records faster?
The difference occurs at memory allocation or at memory access?

Original (Delphi) VirtualTreeView uses packed record, and i'm 
considering removing in the LCL port.


Luiz

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell


The compiler can only care about processor endianness. Having a known 
binary structure is something different as being usable for hardware 
access.
Right. AFAIK, even C does not do this in a language construct. But FP 
might be better than standard :) .


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Micha Nelissen

Daniël Mantione wrote:
To my knowledge there is no problem with the current implementation. 
Endian conversion is already the reponsibility of the programmer. 
Therefore I don't see a need for changes on the compiler side.


Jonas said the bits were swapped, not the bytes. So PPC32 (1 shl 30) 
becomes (1 shl 6) on Intel (actual), while it should be (1 shl 1) 
(expected use by compiler). It's both the "second bit", bit it's in 
different places.


Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Daniël Mantione



Op Thu, 28 Feb 2008, schreef Michael Schnell:



C-style bitpacking ("char c:1" and "int c:1" are often laid out differently 
in C depending on the previous fields,
Not only this. C defines the layout as implementation depended. I once was 
bitten by this when porting a networked project from a low endian processor 
to a high endian processor :( .


Thus if we want binary portability of the structures we need to be better 
than C (optionally defining the structure as high-endian or low endian on 
user request) 


Why?

The only thing I want to guarantee is that blockwrite followed by 
blockread on a platform with the same endianness works and will work in 
the future.


This (combined with ifdef based endian conversion) guarantees portability 
of structures to any platform.


Or do I see this wrong?

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Freepascal in microcontrollers

2008-02-28 Thread Michael Schnell


To my knowledge there is no problem with the current implementation. 
Endian conversion is already the reponsibility of the programmer. 
Therefore I don't see a need for changes on the compiler side.



I don't understand your meaning here.

If a record with binary fields is defined and transferred to (e.g.) 
another instance of the same program running on another machine by 
network or within a file, or if it needs to be crafted according to a 
specification of a network block or file format or a hardware device, 
endianess needs to be taken care of, and it's not nice if the user needs 
to write active code to do this, if a "bitpacked" type is available that 
seemingly can be used for that issue.


You even might want to define a record that just holds a single 16 bit 
value. Here it would be good to be able do define endianess of the 
bitpacked structure to make it compatible with the communication partner.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel