Re: [fpc-devel] Case block optimisation

2018-12-30 Thread J. Gareth Moreton
 So it looks like the teething problems with my case node optimisation have
been resolved, at least for now.  Granted, platforms outside of Intel
haven't been heavily tested yet, I don't think.  I'm worried that
https://bugs.freepascal.org/view.php?id=34782 might be because of it, if
only because it's within a case block, although there's nothing to say it
is because of that yet.

 Gareth aka. Kit
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Case block optimisation

2018-12-27 Thread J. Gareth Moreton
 Hi everyone.
 This is just a follow-up on https://bugs.freepascal.org/view.php?id=34762,
since the issue has been marked as resolved now.

 The apparent regression doesn't seem to exist.  Both the trunk (before it
was updated) and the patch produce a linear list for the test in question -
e.g.:

 .section
.text.n_p$casebranchtest$_$tsparsedataequal2_$__$$_dotestiteration$longint,"x"
     .balign 16,0x90
 .globl   
P$CASEBRANCHTEST$_$TSPARSEDATAEQUAL2_$__$$_DOTESTITERATION$LONGINT
 P$CASEBRANCHTEST$_$TSPARSEDATAEQUAL2_$__$$_DOTESTITERATION$LONGINT:
 .Lc188:
     andl    $1023,%edx
     movswl    %dx,%eax
     subl    $512,%eax
     movswl    %dx,%edx
     cmpl    $-509,%eax
     je    .Lj501
     cmpl    $-353,%eax
     je    .Lj535
     cmpl    $-329,%eax
     je    .Lj502
     

 The only difference are the label numbers.  You are right though Florian
- in this instance, a jump table should not be used because the domain is
massive at 1,024, while only 39 of those values go to branches.  If a jump
table were used, it would be 4 KB in size, and 985 of the 1,024 entries
would point to the else block.  I'm not sure why you got such a huge time
difference.

 I'd like to keep these tests around though because those "Domain is 1,024"
tests are rather extreme cases that could be fun to find optimisations for
(if any exist).

 Gareth aka. Kit
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Case block optimisation

2018-12-27 Thread J. Gareth Moreton
 So I've been doing a bit more research, and I think my binary search
algorithm is, unfortunately, a failure.  Even after restructuring my
lookup table and making some minute optimisations, it's still slower
overall in my test program, and slower than a jump tree if that gets
used.  I'm figuring the slowdowns come from constantly polling the lookup
table.  Oh well, you can't win them all!

 Nevertheless, I believe I have learnt some things and I may be able to
build some special cases.  The case block used in the peephole optimiser
is a particularly challenging test case for me because it contains a number
of branches, but are all very sparse, since the domain of values is almost
1,000.

 Gareth aka. Kit

 On Wed 26/12/18 00:39 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  Just to clarify, the patch does NOT contain the binary search
algorithm.  It does contain a utility function that was accidentally left
over when I stripped the code out, named "CreateCMOVInstr", but this can be
safely removed.

 Gareth aka. Kit

 On Wed 26/12/18 00:27 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  Merry Christmas everyone!

 I've uploaded my improvements to a bug report, along with a test program
that gives timings and verifies correctness.

 https://bugs.freepascal.org/view.php?id=34762

 It turns out that my binary search algorithm runs slower on average
"Domain of 1024" tests (which are based on the design of the peephole
optimizer) that I had set up (which currently still compile into linear
lists) except in the case where the correct branch is the midpoint of the
valid set (in a jump tree, this is the first value tested).  Jump trees
are also faster than a binary search currently despite the large number of
conditional branches required - nevertheless, I will continue to research
this algorithm to see if it can find a role - I suspect that it might still
find a place where each branch calls an identically-structured procedure...
if I can get the compiler to identify that in the branches.  Successfully
doing this will also improve case blocks that are compiled into jump tables
instead.

 Gareth aka. Kit

 On Sat 15/12/18 19:12 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  I'll see what I can do!

 The one thing about my case optimisations here is that they're done at the
node level.  Some parts are platform-independent, but unfortunately,
things like jump table generation have to be implemented for each platform.

 The aint shuffling is a little difficult to test for sure, but what's
worse is the use of TConstExprInt, and not just because of the overhead
involved with reading and writing to them.  I used logic in some places
though, like when I do a sort of for-loop between max_ and hv, I subtracted
1 from each variable (sometimes implicitly) to account for a potential
overflow situation.  It's impossible for it to trigger an underflow
because max_ is greater than min_, and they wouldn't be equal otherwise a
jump table won't even be generated.

 Either way, I'm just enjoying myself with this!

 Gareth aka. Kit

 On Sat 15/12/18 19:55 , Martok list...@martoks-place.de sent:
 Am 15.12.2018 um 18:08 schrieb J. Gareth Moreton: 
 > So here's something else I've been playing with in the interim... I've
been 
 > working on improving the algorithms used when compiling case blocks. 
It was 
 > driven partly by my binary search idea for certain kinds of case block,
which I 
 > wrote up over here:
http://wiki.freepascal.org/Case_Compiler_Optimization
[1]">http://wiki.freepascal.org/Case_Compiler_Optimization 

 Feel free to include '0033093_Casenode_order.patch' from 
  [2]; 
 It's actually already a bit simpler in your version ;-) 

 The one thing that still bugs me (but I know it's not really seen as a
problem 
 in FPC) is that every platform cg theoretically has to implement all of
that, 
 causing a lot of duplication. 

 I like the use of more expressive intermediate names. But there was a lot
of 
 overflowed-aint-shuffling, are you sure that still works everywhere? 

 Bug 0032115 provides a "nice" test case for things that can go wrong with 
 different word sizes, and is also a good test for the true label count. 

 -- 
 Regards, 
 Martok 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [5]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[6]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [7]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[8]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

  

Re: [fpc-devel] Case block optimisation

2018-12-25 Thread J. Gareth Moreton
 Just to clarify, the patch does NOT contain the binary search algorithm. 
It does contain a utility function that was accidentally left over when I
stripped the code out, named "CreateCMOVInstr", but this can be safely
removed.

 Gareth aka. Kit

 On Wed 26/12/18 00:27 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  Merry Christmas everyone!

 I've uploaded my improvements to a bug report, along with a test program
that gives timings and verifies correctness.

 https://bugs.freepascal.org/view.php?id=34762

 It turns out that my binary search algorithm runs slower on average
"Domain of 1024" tests (which are based on the design of the peephole
optimizer) that I had set up (which currently still compile into linear
lists) except in the case where the correct branch is the midpoint of the
valid set (in a jump tree, this is the first value tested).  Jump trees
are also faster than a binary search currently despite the large number of
conditional branches required - nevertheless, I will continue to research
this algorithm to see if it can find a role - I suspect that it might still
find a place where each branch calls an identically-structured procedure...
if I can get the compiler to identify that in the branches.  Successfully
doing this will also improve case blocks that are compiled into jump tables
instead.

 Gareth aka. Kit

 On Sat 15/12/18 19:12 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  I'll see what I can do!

 The one thing about my case optimisations here is that they're done at the
node level.  Some parts are platform-independent, but unfortunately,
things like jump table generation have to be implemented for each platform.

 The aint shuffling is a little difficult to test for sure, but what's
worse is the use of TConstExprInt, and not just because of the overhead
involved with reading and writing to them.  I used logic in some places
though, like when I do a sort of for-loop between max_ and hv, I subtracted
1 from each variable (sometimes implicitly) to account for a potential
overflow situation.  It's impossible for it to trigger an underflow
because max_ is greater than min_, and they wouldn't be equal otherwise a
jump table won't even be generated.

 Either way, I'm just enjoying myself with this!

 Gareth aka. Kit

 On Sat 15/12/18 19:55 , Martok list...@martoks-place.de sent:
 Am 15.12.2018 um 18:08 schrieb J. Gareth Moreton: 
 > So here's something else I've been playing with in the interim... I've
been 
 > working on improving the algorithms used when compiling case blocks. 
It was 
 > driven partly by my binary search idea for certain kinds of case block,
which I 
 > wrote up over here:
http://wiki.freepascal.org/Case_Compiler_Optimization
[1]">http://wiki.freepascal.org/Case_Compiler_Optimization 

 Feel free to include '0033093_Casenode_order.patch' from 
  [2]; 
 It's actually already a bit simpler in your version ;-) 

 The one thing that still bugs me (but I know it's not really seen as a
problem 
 in FPC) is that every platform cg theoretically has to implement all of
that, 
 causing a lot of duplication. 

 I like the use of more expressive intermediate names. But there was a lot
of 
 overflowed-aint-shuffling, are you sure that still works everywhere? 

 Bug 0032115 provides a "nice" test case for things that can go wrong with 
 different word sizes, and is also a good test for the true label count. 

 -- 
 Regards, 
 Martok 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [5]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[6]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [7]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[8]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] http://secureweb.fast.net.uk/ http:=
[2] https://bugs.freepascal.org/view.php?id=33093>
[3] mailto:fpc-devel@lists.freepascal.org
[4] http://secureweb.fast.net.uk/ http:=
[5] mailto:fpc-devel@lists.freepascal.org
[6] http://secureweb.fast.net.uk/ http:=
[7] mailto:fpc-devel@lists.freepascal.org
[8] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Case block optimisation

2018-12-25 Thread J. Gareth Moreton
 Merry Christmas everyone!

 I've uploaded my improvements to a bug report, along with a test program
that gives timings and verifies correctness.

 https://bugs.freepascal.org/view.php?id=34762

 It turns out that my binary search algorithm runs slower on average
"Domain of 1024" tests (which are based on the design of the peephole
optimizer) that I had set up (which currently still compile into linear
lists) except in the case where the correct branch is the midpoint of the
valid set (in a jump tree, this is the first value tested).  Jump trees
are also faster than a binary search currently despite the large number of
conditional branches required - nevertheless, I will continue to research
this algorithm to see if it can find a role - I suspect that it might still
find a place where each branch calls an identically-structured procedure...
if I can get the compiler to identify that in the branches.  Successfully
doing this will also improve case blocks that are compiled into jump tables
instead.

 Gareth aka. Kit

 On Sat 15/12/18 19:12 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  I'll see what I can do!

 The one thing about my case optimisations here is that they're done at the
node level.  Some parts are platform-independent, but unfortunately,
things like jump table generation have to be implemented for each platform.

 The aint shuffling is a little difficult to test for sure, but what's
worse is the use of TConstExprInt, and not just because of the overhead
involved with reading and writing to them.  I used logic in some places
though, like when I do a sort of for-loop between max_ and hv, I subtracted
1 from each variable (sometimes implicitly) to account for a potential
overflow situation.  It's impossible for it to trigger an underflow
because max_ is greater than min_, and they wouldn't be equal otherwise a
jump table won't even be generated.

 Either way, I'm just enjoying myself with this!

 Gareth aka. Kit

 On Sat 15/12/18 19:55 , Martok list...@martoks-place.de sent:
 Am 15.12.2018 um 18:08 schrieb J. Gareth Moreton: 
 > So here's something else I've been playing with in the interim... I've
been 
 > working on improving the algorithms used when compiling case blocks. 
It was 
 > driven partly by my binary search idea for certain kinds of case block,
which I 
 > wrote up over here:
http://wiki.freepascal.org/Case_Compiler_Optimization
[1]">http://wiki.freepascal.org/Case_Compiler_Optimization 

 Feel free to include '0033093_Casenode_order.patch' from 
  [2]; 
 It's actually already a bit simpler in your version ;-) 

 The one thing that still bugs me (but I know it's not really seen as a
problem 
 in FPC) is that every platform cg theoretically has to implement all of
that, 
 causing a lot of duplication. 

 I like the use of more expressive intermediate names. But there was a lot
of 
 overflowed-aint-shuffling, are you sure that still works everywhere? 

 Bug 0032115 provides a "nice" test case for things that can go wrong with 
 different word sizes, and is also a good test for the true label count. 

 -- 
 Regards, 
 Martok 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [5]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[6]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] http://secureweb.fast.net.uk/ http:=
[2] https://bugs.freepascal.org/view.php?id=33093>
[3] mailto:fpc-devel@lists.freepascal.org
[4] http://secureweb.fast.net.uk/ http:=
[5] mailto:fpc-devel@lists.freepascal.org
[6] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Case block optimisation

2018-12-15 Thread J. Gareth Moreton
 I'll see what I can do!

 The one thing about my case optimisations here is that they're done at the
node level.  Some parts are platform-independent, but unfortunately,
things like jump table generation have to be implemented for each platform.

 The aint shuffling is a little difficult to test for sure, but what's
worse is the use of TConstExprInt, and not just because of the overhead
involved with reading and writing to them.  I used logic in some places
though, like when I do a sort of for-loop between max_ and hv, I subtracted
1 from each variable (sometimes implicitly) to account for a potential
overflow situation.  It's impossible for it to trigger an underflow
because max_ is greater than min_, and they wouldn't be equal otherwise a
jump table won't even be generated.

 Either way, I'm just enjoying myself with this!

 Gareth aka. Kit

 On Sat 15/12/18 19:55 , Martok list...@martoks-place.de sent:
 Am 15.12.2018 um 18:08 schrieb J. Gareth Moreton: 
 > So here's something else I've been playing with in the interim... I've
been 
 > working on improving the algorithms used when compiling case blocks. 
It was 
 > driven partly by my binary search idea for certain kinds of case block,
which I 
 > wrote up over here:
http://wiki.freepascal.org/Case_Compiler_Optimization
[1]">http://wiki.freepascal.org/Case_Compiler_Optimization 

 Feel free to include '0033093_Casenode_order.patch' from 
  [2]; 
 It's actually already a bit simpler in your version ;-) 

 The one thing that still bugs me (but I know it's not really seen as a
problem 
 in FPC) is that every platform cg theoretically has to implement all of
that, 
 causing a lot of duplication. 

 I like the use of more expressive intermediate names. But there was a lot
of 
 overflowed-aint-shuffling, are you sure that still works everywhere? 

 Bug 0032115 provides a "nice" test case for things that can go wrong with 
 different word sizes, and is also a good test for the true label count. 

 -- 
 Regards, 
 Martok 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] http://wiki.freepascal.org/Case_Compiler_Optimization
[2] https://bugs.freepascal.org/view.php?id=33093
[3] mailto:fpc-devel@lists.freepascal.org
[4] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Case block optimisation

2018-12-15 Thread Martok
Am 15.12.2018 um 18:08 schrieb J. Gareth Moreton:
> So here's something else I've been playing with in the interim... I've been
> working on improving the algorithms used when compiling case blocks.  It was
> driven partly by my binary search idea for certain kinds of case block, which 
> I
> wrote up over here: http://wiki.freepascal.org/Case_Compiler_Optimization

Feel free to include '0033093_Casenode_order.patch' from

It's actually already a bit simpler in your version ;-)

The one thing that still bugs me (but I know it's not really seen as a problem
in FPC) is that every platform cg theoretically has to implement all of that,
causing a lot of duplication.

I like the use of more expressive intermediate names. But there was a lot of
overflowed-aint-shuffling, are you sure that still works everywhere?

Bug 0032115 provides a "nice" test case for things that can go wrong with
different word sizes, and is also a good test for the true label count.


-- 
Regards,
Martok

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Case block optimisation

2018-12-15 Thread J. Gareth Moreton
 Hi everyone,

 So here's something else I've been playing with in the interim... I've
been working on improving the algorithms used when compiling case blocks. 
It was driven partly by my binary search idea for certain kinds of case
block, which I wrote up over here:
http://wiki.freepascal.org/Case_Compiler_Optimization [1]

 So far, the binary search is a bit inconclusive on timing improvements, so
I need to work on it a bit more and also write some tests to show correct
operation and timings.

 Nevertheless, find attached a preliminary patch that improves other
aspects of code generation:

 - If a case block only contains one branch (not including the else block),
the initial range check is removed, since this becomes wasted effort.- If
the else block is empty, the else label is set to the end label - though
this doesn't increase the code size, it takes a bit of strain off the
peephole optimizer.
 - On -O2 and above, some node analysis is now done on the branch labels. 
Most of the time this just redirects it to the end label for empty blocks,
but if the block contains a goto statement, it will redirect it to its
destination instead, thus increasing performance by not having multiple
jumps (this won't get picked up by the peephole optimiser if the label
addresses are in a jump table).
 - Some checks now use what I call the 'true count' rather than the 'label
count'.  The true count includes each individual value in a range - for
example, 0..2 counts as 3.
 - For jump tables, if the case block almost covers the entire range (32
entries or fewer from full coverage), the initial range check is removed
and the gaps included in the jump table.

 To give an example as to where these improvements take effect, the
"taicpu.calcsize" method in "/x86/aasmcpu.pas" has a large case block that
analyses the encoding of assembler commands - before, this only became a
linear list because while there are loads of entries, the "labelcnt" value
is still under 64 because a number of values are combined into a range. 
This is now changed to an efficient jump table, and furthermore, it is
almost exhaustive because the input type is a Char, and only a handful of
values redirect to the else block, which raises an internal error.  As a
result, the jump table is changed to contain all 256 possible values (1KB
in size), with addresses to the else block where appropriate.

 The patch doesn't contain the binary search yet, but this algorithm covers
situations that are unsuitable for a jump table, namely the domain is large
but the number of valid blocks is very sparse (for example, if the input
type is a Word, a number of small values are checked, and then it checks
the value 10,000 as well, which sits well beyond any other values).  The
jump tree is still used for excessively-large case blocks.

 I won't consider it ready for patching just yet because I want to create
some more tests first.  Still, what do you think?

 Gareth aka. Kit
 (P.S. I hope the patch isn't too large that it gets flagged for
moderation)
  

Links:
--
[1] http://wiki.freepascal.org/Case_Compiler_Optimization


case-improvement.patch
Description: Binary data
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel