Re: [fpc-devel] Sorry for poor testing
Hi! I have a proposition for optimizing big case statements with sparse case labels. Have you already thought of using a hash function to drastically reduce the size of the jump table? I have made some tests. For example, assume a case statement with 80 values for case labels in the range between 1 and 5000. No matter how they are distributed, It is always possible to find a hash function of the type y=(a*x^2+b*x) and 252 , to map those 80 values directy to a 64 entry hash table with 16 bit jump offset values (this hash function gives directly the offset of the correct jump table entry), where 3 or less case label values are mapped to each entry. You just have to try different prime numbers in the range 2-3 for the values of a and b, to find a hash function that works well for the given combination of case label values. At runtime, after calculation of the hash function - one addition, two 16-bit multiplications and one "and" operation - you need just three or less 16 bit comparisons to find the correct case branch for all possible values of the case variable. Franz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Sorry for poor testing
For that very reason, I put an upper limit of 2,048 on the jump table size (which results in an 8 KiB table), although it was modified later so this bound doesn't apply if every single label points to a single value and more or less forms a continuum (e.g, you have 2,500 labels and none of them are a range). For a linear search, each branch takes between 2 and 4 instructions to test, if I remember correctly, although there is a lot of dependency, since it usually involves a subtraction followed by a conditional jump. My additional changes over at #34859 introduce some extra conditional jumps to break out sooner once the program knows there won't be a match (because, for example, there are labels for just 1 and 3 and the input value is 2). Before, the flow only jumped out if it reached a range and the input was less than the lower bound. I did some preliminary testing with tests/bench/bcase.pp and the overall timespan is about 5% faster. Gareth aka. Kit On Tue 15/01/19 17:27 , Martok list...@martoks-place.de sent: Am 14.01.2019 um 15:01 schrieb J. Gareth Moreton: > Martok mentioned doing some checks differently in the bug report in question, > such as 6 comparisons being faster than a jump table. Are there any others > worth mentioning? Not neccessarily faster, but in that code definitely smaller. Is there a way to directly estimate how large the generated code might be, maybe something like "numlabels*constant"? For the "faster" question, the cache line occupation matters more than instruction throughput. If your jumptable gets large enough to force a full data cache line flush, you could have done a lot of instructions in the time the CPU waits for the data. And I haven't found reliable information how the Spectre-related microcode updates changed that, given that both cmp/jmp and jmp[indirect] used to be heavily optimized, but were both affected. -- Regards, Martok ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Sorry for poor testing
Am 14.01.2019 um 15:01 schrieb J. Gareth Moreton: > Martok mentioned doing some checks differently in the bug report in question, > such as 6 comparisons being faster than a jump table. Are there any others > worth mentioning? Not neccessarily faster, but in that code definitely smaller. Is there a way to directly estimate how large the generated code might be, maybe something like "numlabels*constant"? For the "faster" question, the cache line occupation matters more than instruction throughput. If your jumptable gets large enough to force a full data cache line flush, you could have done a lot of instructions in the time the CPU waits for the data. And I haven't found reliable information how the Spectre-related microcode updates changed that, given that both cmp/jmp and jmp[indirect] used to be heavily optimized, but were both affected. -- Regards, Martok ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Sorry for poor testing
Aah cool. Thanks for that tip. Hmmm... I see "riot" in the flags... I think it's going to be a riot with the errors found! Gareth aka. Kit On Mon 14/01/19 18:17 , Pierre Muller pie...@freepascal.org sent: Hi all, Le 14/01/2019 à 15:01, J. Gareth Moreton a écrit : > Hi everyone, > > I apologise I didn't properly test my case block improvements, especially where optimising for size is concerned. As someone who has worked in SQA, I should have known better. > > I've also spotted a potential overflow condition (in some situations, max_dist can become negative) - this is thankfully disguised, but I would like to properly trap it and handle it cleanly and safely. You should try to add -CriotR to OPT when cycliing the compiler, as it might find the overfows for you... and even also do make fullcycle OPT="-n -CriotR -gl" as testing the change for 16 bit address CPU can also reveal more range checking/overflow exceptions. Pierre ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Sorry for poor testing
Hi all, Le 14/01/2019 à 15:01, J. Gareth Moreton a écrit : > Hi everyone, > > I apologise I didn't properly test my case block improvements, especially > where optimising for size is concerned. As someone who has worked in SQA, I > should have known better. > > I've also spotted a potential overflow condition (in some situations, > max_dist can become negative) - this is thankfully disguised, but I would > like to properly trap it and handle it cleanly and safely. You should try to add -CriotR to OPT when cycliing the compiler, as it might find the overfows for you... and even also do make fullcycle OPT="-n -CriotR -gl" as testing the change for 16 bit address CPU can also reveal more range checking/overflow exceptions. Pierre ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel