Re: Patch: Optimise ascii char classificaton

Dominique Pellé Wed, 12 Jun 2013 12:34:55 -0700

LCD 47 <[email protected]> wrote:

> On 12 June 2013, Dominique Pellé <[email protected]> wrote:
> [...]
>> You can half the number of comparisons when doing range checking.
>> Instead of doing 2 comparisons...
>>
>>    if (c >= '0' && c <= '9')
>>
>> You can do it in one comparison with this trick:
>>
>>   if ((unsigned)c - '0' < 10)
> [...]
>
>     Actually, the second version is a bit slower than the first. :) A
> comparison is exactly as fast as a subtraction of the same length, and
> (unsigned)c - '0' needs to store the result in  memory.
>
>     /lcd


It depends on the machine of course, but often branching
is more expensive because the processor can mispredict the
outcome, causing it to discard in pipeline instruction(s) already
partially processed:

http://en.wikipedia.org/wiki/Branch_misprediction
http://en.wikipedia.org/wiki/Instruction_pipeline#Branches

That's why there are people who find clever branchless
algorithms for all kind of things (min, max, abs...).
The branchless abs() algorithm is even apparently
patented (sigh):

http://www.strchr.com/optimized_abs_function

Having said that, I have compiled this function...

int is_digit(int d)
{
#ifdef OPTIMIZE
  return ((unsigned)d - '0') <= 9;
#else
  return d >= '0' && d <= '9';
#endif
}

With gcc -O0 on x86_64, the code produced is shorter
and faster with -DOPTIMIZE as I was hoping for.

-O0 without -DOPTIMIZE (two comparisons):

000000000040051c <is_digit>:
  40051c:    55                       push   %rbp
  40051d:    48 89 e5                 mov    %rsp,%rbp
  400520:    89 7d fc                 mov    %edi,-0x4(%rbp)
  400523:    83 7d fc 2f              cmpl   $0x2f,-0x4(%rbp)
  400527:    7e 0d                    jle    400536 <is_digit+0x1a>
  400529:    83 7d fc 39              cmpl   $0x39,-0x4(%rbp)
  40052d:    7f 07                    jg     400536 <is_digit+0x1a>
  40052f:    b8 01 00 00 00           mov    $0x1,%eax
  400534:    eb 05                    jmp    40053b <is_digit+0x1f>
  400536:    b8 00 00 00 00           mov    $0x0,%eax
  40053b:    5d                       pop    %rbp
  40053c:    c3                       retq

-O0 -DOPTIMIZE (one comparison):

000000000040051c <is_digit>:
  40051c:    55                       push   %rbp
  40051d:    48 89 e5                 mov    %rsp,%rbp
  400520:    89 7d fc                 mov    %edi,-0x4(%rbp)
  400523:    8b 45 fc                 mov    -0x4(%rbp),%eax
  400526:    83 e8 30                 sub    $0x30,%eax
  400529:    83 f8 09                 cmp    $0x9,%eax
  40052c:    0f 96 c0                 setbe  %al
  40052f:    0f b6 c0                 movzbl %al,%eax
  400532:    5d                       pop    %rbp
  400533:    c3                       retq

But with gcc -O1, -O2, or -O3, the generated code is
identical anyway whether OPTIMIZE is defined or not.
For example gcc -O3 produces this code with/without
-DOPTIMIZE:

00000000004005f0 <is_digit>:
  4005f0:    83 ef 30                 sub    $0x30,%edi
  4005f3:    31 c0                    xor    %eax,%eax
  4005f5:    83 ff 09                 cmp    $0x9,%edi
  4005f8:    0f 96 c0                 setbe  %al
  4005fb:    c3                       retq
  4005fc:    0f 1f 40 00              nopl   0x0(%rax)

Dominique

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Patch: Optimise ascii char classificaton

Raspunde prin e-mail lui