Impressive numbers and a nice examplz to grasp lots of NB/Asm details.
Le 27 juin 2014 18:55, "Henrik Johansen" <[email protected]> a
écrit :
>
> Soo, I started dabbling with the thing I talked about before last summer,
letting String parameters in NB calls have an encoding: option.
> (There’s already a slice in inbox to allow optional values other than
true/false)
>
> Thought I’d start with decoding; here’s a small preview of the part which
does the actual decoding, after needed string class has has been determined
and instantiated.
> While it’s a fallback path for when the platform doesn’t support SSE or
other batch operations, it’s still using some neat tricks (imho) I thought
others might enjoy on a Friday afternoon :)
>
> emitStandardDecodeUTF8CharactersFrom: aSource to: aDestination
withCharSize: aCharSize scratchReg: scratchReg using: aGenerator
> "Emit decoding using only standard x86 ops"
> "We have already found what String class is needed for decoding
aSource, and created an instance of the proper size"
> "This implementation focuses on minimizing jumps and register
usage, at the cost of loading from source one byte at a time.
> "Input:
> aSource - memory pointer to C-string with UTF8 bytes
> aDestination - memoryPointer to first var field of String
instance
> scratchReg - a register which will be modified while
decoding
>
> aCharSize - The size in bytes of each character in our
destination string, known at emission time
>
> Clobbers: scratchReg
> aSource and aDestination will end up pointing to end of strings"
>
> | asm scratch32 sLowByte sHighByte loop done oneByte twoBytes
threeBytes |
>
> asm := aGenerator asm.
> loop := asm uniqueLabelName: 'utf8DecodeLoop'.
> done := asm uniqueLabelName: 'utf8DecodingDone'.
> scratch32 := scratchReg as32.
> sLowByte := scratch32 as8.
> sHighByte := sLowByte asHighByte.
>
> asm label: loop.
> "Unroll the inner loop as many times as we want, or, well, at
least as many times as the backwards jump will allow us to"
> 8 timesRepeat:[
> oneByte := asm uniqueLabelName: 'utf8OneByteDecode'.
> twoBytes := asm uniqueLabelName: 'utf8TwoByteDecode'.
> threeBytes := asm uniqueLabelName: 'utf8ThreeByteDecode'.
> asm xor: scratch32 with:scratch32.
> asm or: sLowByte with: aSource ptr8.
> asm cmp: sLowByte with: 0.
> asm je: done.
> asm add: aSource with: 1.
> asm test: sLowByte with: 2r10000000 asUImm8.
> asm jz: oneByte.
> "We have a header, place its data bits as initial high byte value"
> asm shl: scratch32 with: 8.
> asm xor: sHighByte with: 2r11000000 asUImm8. "Strip 2 byte header"
> asm test: sHighByte with: 2r00100000.
> asm jz: twoBytes.
> aCharSize > 1 ifTrue: [
> asm xor: sHighByte with: 2r00100000. "Strip 3 byte header"
> asm test: sHighByte with: 2r000100000.
> asm jz: threeBytes.
> "This is a 4-byte character"
> asm xor: sHighByte with:2r00010000."Strip 4 byte header"
> "Read one trailing byte, remove the header, and shift the data
out of low byte"
> asm or: sLowByte with: aSource ptr8.
> asm shl: sLowByte with:2.
> asm shl: scratch32 with: 6.
> asm add: aSource with: 1.
> asm label: threeBytes.
> "Read one trailing byte, remove the header, and shift the data
out of low byte"
> asm or: sLowByte with: aSource ptr8.
> asm shl: sLowByte with:2.
> asm shl: scratch32 with: 6.
> asm add: aSource with: 1.
> ].
> asm label: twoBytes.
> "Read last trailing byte, remove header, and shift the data bits
into proper place"
> asm or: sLowByte with: aSource ptr8.
> asm shl: sLowByte with:2.
> asm shr: scratch32 with: 2.
> asm add: aSource with: 1.
> asm label: oneByte.
> asm mov: (aDestination ptr size: aCharSize) with: (scratch32 as:
aCharSize).
> asm add: aDestination with: aCharSize.].
> asm jmp: loop.
> asm label: done.
>
> And the relevant test code for that:
>
> testStandardDecodeWide
> | bytes string |
> "bytes := (ZnUTF8Encoder new encodeString: 'Cash, like €, is
king'), #[0]."
> bytes := #[67 97 115 104 44 32 108 105 107 101 32 226 130 172 44
32 105 115 32 107 105 110 103 0].
> string := WideString new: bytes size - 1.
> self testStandardDecode: bytes toWideString: string.
> ^ string
>
> testStandardDecode: utf8Bytes toWideString:aString
> <primitive: #primitiveNativeCall module: #NativeBoostPlugin>
> ^ self nbCallout
> function: #(void #(char* utf8Bytes, char* aString ))
> emit: [ :gen :proxy :asm |
> asm pop: asm EBX;
> pop: asm ECX.
> self emitStandardDecodeUTF8CharactersFrom: asm
EBX to: asm ECX withCharSize: 4 scratchReg: asm EAX using: gen.
> asm mov: EAX with: gen proxy nilObject ]
>
> Which, though it’s currently cheating by pre-knowledn string class/size,
isn’t alot of overhead:
> ext := NBExternalString new.
> [ext testStandardDecodeWide] bench '5,030,000 per second.' '5,080,000
per second.' '5,190,000 per second.’
> Compared to an equivalent to testStandardDecodeWide, with emitStandard…
removed from the primitive:
> [ext testEmptyDecode] bench '5,850,000 per second.' '5,800,000 per
second.' '5,640,000 per second.'
>
> … or compared to doing the decoding in image after the call:
> int := ZnUTF8Encoder new.
> [int decodeBytes:#[67 97 115 104 44 32 108 105 107 101 32 226 130 172 44
32 105 115 32 107 105 110 103 0]] bench '130,000 per second.' '131,000 per
second.' '132,000 per second.’
>
> Cheers,
> Henry
>