Re: [fpc-devel] Sorry for poor testing

2019-01-16 Thread Franz Müller

Hi!

I have a proposition for optimizing big case statements with sparse case 
labels. Have you already thought of using a hash function to drastically 
reduce the size of the jump table?


I have made some tests. For example, assume a case statement with 80 
values for case labels in the range between 1 and 5000.


No matter how they are distributed, It is always possible to find a hash 
function of the type y=(a*x^2+b*x) and 252 , to map those 80 values 
directy to a 64 entry hash table with 16 bit jump offset values (this 
hash function gives directly the offset of the correct jump table 
entry), where 3 or less case label values are mapped to each entry. You 
just have to try different prime numbers in the range 2-3 for 
the values of a and b, to find a hash function that works well for the 
given combination of case label values. At runtime, after calculation of 
the hash function - one addition, two 16-bit multiplications and one 
"and" operation - you need just three or less 16 bit comparisons to find 
the correct case branch for all possible values of the case variable.


Franz

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Sorry for poor testing

2019-01-15 Thread J. Gareth Moreton
 For that very reason, I put an upper limit of 2,048 on the jump table size
(which results in an 8 KiB table), although it was modified later so this
bound doesn't apply if every single label points to a single value and more
or less forms a continuum (e.g, you have 2,500 labels and none of them are
a range).

 For a linear search, each branch takes between 2 and 4 instructions to
test, if I remember correctly, although there is a lot of dependency, since
it usually involves a subtraction followed by a conditional jump.  My
additional changes over at #34859 introduce some extra conditional jumps to
break out sooner once the program knows there won't be a match (because,
for example, there are labels for just 1 and 3 and the input value is 2). 
Before, the flow only jumped out if it reached a range and the input was
less than the lower bound.  I did some preliminary testing with
tests/bench/bcase.pp and the overall timespan is about 5% faster.

 Gareth aka. Kit

 On Tue 15/01/19 17:27 , Martok list...@martoks-place.de sent:
 Am 14.01.2019 um 15:01 schrieb J. Gareth Moreton: 
 > Martok mentioned doing some checks differently in the bug report in
question, 
 > such as 6 comparisons being faster than a jump table.  Are there any
others 
 > worth mentioning? 
 Not neccessarily faster, but in that code definitely smaller. Is there a
way to 
 directly estimate how large the generated code might be, maybe something
like 
 "numlabels*constant"? 

 For the "faster" question, the cache line occupation matters more than 
 instruction throughput. If your jumptable gets large enough to force a
full data 
 cache line flush, you could have done a lot of instructions in the time
the CPU 
 waits for the data. And I haven't found reliable information how the 
 Spectre-related microcode updates changed that, given that both cmp/jmp
and 
 jmp[indirect] used to be heavily optimized, but were both affected. 

 -- 
 Regards, 
 Martok 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Sorry for poor testing

2019-01-15 Thread Martok
Am 14.01.2019 um 15:01 schrieb J. Gareth Moreton:
> Martok mentioned doing some checks differently in the bug report in question,
> such as 6 comparisons being faster than a jump table.  Are there any others
> worth mentioning?
Not neccessarily faster, but in that code definitely smaller. Is there a way to
directly estimate how large the generated code might be, maybe something like
"numlabels*constant"?

For the "faster" question, the cache line occupation matters more than
instruction throughput. If your jumptable gets large enough to force a full data
cache line flush, you could have done a lot of instructions in the time the CPU
waits for the data. And I haven't found reliable information how the
Spectre-related microcode updates changed that, given that both cmp/jmp and
jmp[indirect] used to be heavily optimized, but were both affected.


-- 
Regards,
Martok


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Sorry for poor testing

2019-01-14 Thread J. Gareth Moreton
 Aah cool.  Thanks for that tip.  Hmmm... I see "riot" in the flags... I
think it's going to be a riot with the errors found!

 Gareth aka. Kit

 On Mon 14/01/19 18:17 , Pierre Muller pie...@freepascal.org sent:
 Hi all, 

 Le 14/01/2019 à 15:01, J. Gareth Moreton a écrit : 
 > Hi everyone, 
 > 
 > I apologise I didn't properly test my case block improvements,
especially where optimising for size is concerned.  As someone who has
worked in SQA, I should have known better. 
 > 
 > I've also spotted a potential overflow condition (in some situations,
max_dist can become negative) - this is thankfully disguised, but I would
like to properly trap it and handle it cleanly and safely. 

 You should try to add 
 -CriotR to OPT when cycliing the compiler, 
 as it might find the overfows for you... 
 and even also do 
 make fullcycle OPT="-n -CriotR -gl" 
 as testing the change for 16 bit address 
 CPU can also reveal more range checking/overflow exceptions. 

 Pierre 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Sorry for poor testing

2019-01-14 Thread Pierre Muller
Hi all,


Le 14/01/2019 à 15:01, J. Gareth Moreton a écrit :
> Hi everyone,
> 
> I apologise I didn't properly test my case block improvements, especially 
> where optimising for size is concerned.  As someone who has worked in SQA, I 
> should have known better.
> 
> I've also spotted a potential overflow condition (in some situations, 
> max_dist can become negative) - this is thankfully disguised, but I would 
> like to properly trap it and handle it cleanly and safely.

  You should try to add
  -CriotR to OPT when cycliing the compiler,
as it might find the overfows for you...
and even also do
make  fullcycle OPT="-n -CriotR -gl"
as testing the change for 16 bit address
CPU can also reveal more range checking/overflow exceptions.

Pierre

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel