Impressive numbers and a nice examplz to grasp lots of NB/Asm details.
Le 27 juin 2014 18:55, "Henrik Johansen" <[email protected]> a
écrit :
>
> Soo, I started dabbling with the thing I talked about before last summer,
letting String parameters in NB calls have an encoding: option.
> (There’s already a slice in inbox to allow optional values other than
true/false)
>
> Thought I’d start with decoding; here’s a small preview of the part which
does the actual decoding, after needed string class has has been determined
and instantiated.
> While it’s a fallback path for when the platform doesn’t support SSE or
other batch operations, it’s still using some neat tricks (imho) I thought
others might enjoy on a Friday afternoon :)
>
> emitStandardDecodeUTF8CharactersFrom: aSource to: aDestination
withCharSize: aCharSize scratchReg: scratchReg using: aGenerator
>         "Emit decoding using only standard x86 ops"
>         "We have already found what String class is needed for decoding
aSource, and created an instance of the proper size"
>         "This implementation focuses on minimizing jumps and register
usage, at the cost of loading from source one byte at a time.
>         "Input:
>                 aSource - memory pointer to C-string with UTF8 bytes
>                 aDestination - memoryPointer to first var field of String
instance
>                 scratchReg - a register which will be modified while
decoding
>
>                 aCharSize - The size in bytes of each character in our
destination string, known at emission time
>
>         Clobbers: scratchReg
>         aSource and aDestination will end up pointing to end of strings"
>
>         | asm scratch32 sLowByte sHighByte loop done oneByte twoBytes
threeBytes  |
>
>         asm := aGenerator asm.
>         loop := asm uniqueLabelName: 'utf8DecodeLoop'.
>         done := asm uniqueLabelName: 'utf8DecodingDone'.
>         scratch32 := scratchReg as32.
>         sLowByte  := scratch32 as8.
>         sHighByte := sLowByte asHighByte.
>
>         asm label: loop.
>         "Unroll the inner loop as many times as we want, or, well, at
least as many times as the backwards jump will allow us to"
>         8 timesRepeat:[
>         oneByte := asm uniqueLabelName: 'utf8OneByteDecode'.
>         twoBytes := asm uniqueLabelName: 'utf8TwoByteDecode'.
>         threeBytes := asm uniqueLabelName: 'utf8ThreeByteDecode'.
>         asm xor: scratch32 with:scratch32.
>         asm or: sLowByte with: aSource ptr8.
>         asm cmp: sLowByte with: 0.
>         asm je: done.
>         asm add: aSource with: 1.
>         asm test: sLowByte with: 2r10000000 asUImm8.
>         asm jz: oneByte.
>         "We have a header, place its data bits as initial high byte value"
>         asm shl: scratch32 with: 8.
>         asm xor: sHighByte with: 2r11000000 asUImm8. "Strip 2 byte header"
>         asm test: sHighByte with: 2r00100000.
>         asm jz: twoBytes.
>         aCharSize > 1 ifTrue: [
>         asm xor: sHighByte with: 2r00100000. "Strip 3 byte header"
>         asm test: sHighByte with:       2r000100000.
>         asm jz: threeBytes.
>         "This is a 4-byte character"
>         asm xor: sHighByte with:2r00010000."Strip 4 byte header"
>         "Read one trailing byte, remove the header, and shift the data
out of low byte"
>         asm or: sLowByte with: aSource ptr8.
>         asm shl: sLowByte with:2.
>         asm shl: scratch32 with: 6.
>         asm add: aSource with: 1.
> asm label: threeBytes.
>         "Read one trailing byte, remove the header, and shift the data
out of low byte"
>         asm or: sLowByte with: aSource ptr8.
>         asm shl: sLowByte with:2.
>         asm shl: scratch32 with: 6.
>         asm add: aSource with: 1.
>         ].
> asm label: twoBytes.
>         "Read last trailing byte, remove header, and shift the data bits
into proper place"
>         asm or: sLowByte with: aSource ptr8.
>         asm shl: sLowByte with:2.
>         asm shr: scratch32 with: 2.
>         asm add: aSource with: 1.
> asm label: oneByte.
>         asm mov: (aDestination ptr size: aCharSize) with: (scratch32 as:
aCharSize).
>         asm add: aDestination with: aCharSize.].
>         asm jmp: loop.
>         asm label: done.
>
> And the relevant test code for that:
>
> testStandardDecodeWide
>         | bytes string |
>         "bytes := (ZnUTF8Encoder new encodeString: 'Cash, like €, is
king'), #[0]."
>         bytes := #[67 97 115 104 44 32 108 105 107 101 32 226 130 172 44
32 105 115 32 107 105 110 103 0].
>         string := WideString new: bytes size - 1.
>         self testStandardDecode: bytes toWideString: string.
>         ^ string
>
> testStandardDecode: utf8Bytes toWideString:aString
>         <primitive: #primitiveNativeCall module: #NativeBoostPlugin>
>         ^ self nbCallout
>                 function: #(void #(char* utf8Bytes,  char* aString ))
>                 emit: [ :gen :proxy :asm |
>                         asm pop: asm EBX;
>                                 pop: asm ECX.
>                         self emitStandardDecodeUTF8CharactersFrom: asm
EBX to: asm ECX withCharSize: 4 scratchReg: asm EAX using: gen.
>                         asm mov: EAX with: gen proxy nilObject  ]
>
> Which, though it’s currently cheating by pre-knowledn string class/size,
isn’t alot of overhead:
> ext := NBExternalString new.
> [ext testStandardDecodeWide]  bench  '5,030,000 per second.' '5,080,000
per second.' '5,190,000 per second.’
> Compared to an equivalent to testStandardDecodeWide, with emitStandard…
removed from the primitive:
> [ext testEmptyDecode]  bench '5,850,000 per second.' '5,800,000 per
second.' '5,640,000 per second.'
>
> … or compared to doing the decoding in image after the call:
> int := ZnUTF8Encoder new.
> [int decodeBytes:#[67 97 115 104 44 32 108 105 107 101 32 226 130 172 44
32 105 115 32 107 105 110 103 0]] bench  '130,000 per second.' '131,000 per
second.' '132,000 per second.’
>
> Cheers,
> Henry
>

Reply via email to