Nice, could be very useful, slow UTF8 encoding/decoding is a potential bottleneck.
On 27 Jun 2014, at 19:23, [email protected] wrote: > Impressive numbers and a nice examplz to grasp lots of NB/Asm details. > Le 27 juin 2014 18:55, "Henrik Johansen" <[email protected]> a > écrit : > > > > Soo, I started dabbling with the thing I talked about before last summer, > > letting String parameters in NB calls have an encoding: option. > > (There’s already a slice in inbox to allow optional values other than > > true/false) > > > > Thought I’d start with decoding; here’s a small preview of the part which > > does the actual decoding, after needed string class has has been determined > > and instantiated. > > While it’s a fallback path for when the platform doesn’t support SSE or > > other batch operations, it’s still using some neat tricks (imho) I thought > > others might enjoy on a Friday afternoon :) > > > > emitStandardDecodeUTF8CharactersFrom: aSource to: aDestination > > withCharSize: aCharSize scratchReg: scratchReg using: aGenerator > > "Emit decoding using only standard x86 ops" > > "We have already found what String class is needed for decoding > > aSource, and created an instance of the proper size" > > "This implementation focuses on minimizing jumps and register > > usage, at the cost of loading from source one byte at a time. > > "Input: > > aSource - memory pointer to C-string with UTF8 bytes > > aDestination - memoryPointer to first var field of String > > instance > > scratchReg - a register which will be modified while > > decoding > > > > aCharSize - The size in bytes of each character in our > > destination string, known at emission time > > > > Clobbers: scratchReg > > aSource and aDestination will end up pointing to end of strings" > > > > | asm scratch32 sLowByte sHighByte loop done oneByte twoBytes > > threeBytes | > > > > asm := aGenerator asm. > > loop := asm uniqueLabelName: 'utf8DecodeLoop'. > > done := asm uniqueLabelName: 'utf8DecodingDone'. > > scratch32 := scratchReg as32. > > sLowByte := scratch32 as8. > > sHighByte := sLowByte asHighByte. > > > > asm label: loop. > > "Unroll the inner loop as many times as we want, or, well, at least > > as many times as the backwards jump will allow us to" > > 8 timesRepeat:[ > > oneByte := asm uniqueLabelName: 'utf8OneByteDecode'. > > twoBytes := asm uniqueLabelName: 'utf8TwoByteDecode'. > > threeBytes := asm uniqueLabelName: 'utf8ThreeByteDecode'. > > asm xor: scratch32 with:scratch32. > > asm or: sLowByte with: aSource ptr8. > > asm cmp: sLowByte with: 0. > > asm je: done. > > asm add: aSource with: 1. > > asm test: sLowByte with: 2r10000000 asUImm8. > > asm jz: oneByte. > > "We have a header, place its data bits as initial high byte value" > > asm shl: scratch32 with: 8. > > asm xor: sHighByte with: 2r11000000 asUImm8. "Strip 2 byte header" > > asm test: sHighByte with: 2r00100000. > > asm jz: twoBytes. > > aCharSize > 1 ifTrue: [ > > asm xor: sHighByte with: 2r00100000. "Strip 3 byte header" > > asm test: sHighByte with: 2r000100000. > > asm jz: threeBytes. > > "This is a 4-byte character" > > asm xor: sHighByte with:2r00010000."Strip 4 byte header" > > "Read one trailing byte, remove the header, and shift the data out > > of low byte" > > asm or: sLowByte with: aSource ptr8. > > asm shl: sLowByte with:2. > > asm shl: scratch32 with: 6. > > asm add: aSource with: 1. > > asm label: threeBytes. > > "Read one trailing byte, remove the header, and shift the data out > > of low byte" > > asm or: sLowByte with: aSource ptr8. > > asm shl: sLowByte with:2. > > asm shl: scratch32 with: 6. > > asm add: aSource with: 1. > > ]. > > asm label: twoBytes. > > "Read last trailing byte, remove header, and shift the data bits > > into proper place" > > asm or: sLowByte with: aSource ptr8. > > asm shl: sLowByte with:2. > > asm shr: scratch32 with: 2. > > asm add: aSource with: 1. > > asm label: oneByte. > > asm mov: (aDestination ptr size: aCharSize) with: (scratch32 as: > > aCharSize). > > asm add: aDestination with: aCharSize.]. > > asm jmp: loop. > > asm label: done. > > > > And the relevant test code for that: > > > > testStandardDecodeWide > > | bytes string | > > "bytes := (ZnUTF8Encoder new encodeString: 'Cash, like €, is > > king'), #[0]." > > bytes := #[67 97 115 104 44 32 108 105 107 101 32 226 130 172 44 32 > > 105 115 32 107 105 110 103 0]. > > string := WideString new: bytes size - 1. > > self testStandardDecode: bytes toWideString: string. > > ^ string > > > > testStandardDecode: utf8Bytes toWideString:aString > > <primitive: #primitiveNativeCall module: #NativeBoostPlugin> > > ^ self nbCallout > > function: #(void #(char* utf8Bytes, char* aString )) > > emit: [ :gen :proxy :asm | > > asm pop: asm EBX; > > pop: asm ECX. > > self emitStandardDecodeUTF8CharactersFrom: asm EBX > > to: asm ECX withCharSize: 4 scratchReg: asm EAX using: gen. > > asm mov: EAX with: gen proxy nilObject ] > > > > Which, though it’s currently cheating by pre-knowledn string class/size, > > isn’t alot of overhead: > > ext := NBExternalString new. > > [ext testStandardDecodeWide] bench '5,030,000 per second.' '5,080,000 per > > second.' '5,190,000 per second.’ > > Compared to an equivalent to testStandardDecodeWide, with emitStandard… > > removed from the primitive: > > [ext testEmptyDecode] bench '5,850,000 per second.' '5,800,000 per > > second.' '5,640,000 per second.' > > > > … or compared to doing the decoding in image after the call: > > int := ZnUTF8Encoder new. > > [int decodeBytes:#[67 97 115 104 44 32 108 105 107 101 32 226 130 172 44 32 > > 105 115 32 107 105 110 103 0]] bench '130,000 per second.' '131,000 per > > second.' '132,000 per second.’ > > > > Cheers, > > Henry > >
