Giovanni Bajo added the comment:

For short strings, you might want to have a look at the way you fetch the final 
partial word from memory.

If the string is >= 8 bytes, you can fetch the last partial word as an 
unaligned memory fetch followed by a shift, instead of using a switch like in 
the reference code.


