Re: Reducing the cost of autodecoding

2016-10-25 Thread safety0ff via Digitalmars-d
On Tuesday, 25 October 2016 at 21:46:30 UTC, safety0ff wrote: P.S. I am aware that this pessimises popFront for code which only counts codepoints without inspecting them. Unfortunately it also changes the API of popFront to throw on invalid characters. So the example would need to be

Re: Reducing the cost of autodecoding

2016-10-25 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: Now it's time to look at the end-to-end cost of autodecoding. Some food for thought: - front necessarily needs to compute the number of bytes to advance. - We can't change the API to share data between front and

Re: Reducing the cost of autodecoding

2016-10-25 Thread Steven Schveighoffer via Digitalmars-d
On 10/25/16 3:32 PM, Stefam Koch wrote: On Tuesday, 25 October 2016 at 19:16:07 UTC, Steven Schveighoffer wrote: Should behave better on 32-bit system. You could also store 3-bits to avoid extra addition. The whole point of a LUT to begin with is to reduce instructions. Yes, I know. But

Re: Reducing the cost of autodecoding

2016-10-25 Thread Stefam Koch via Digitalmars-d
On Tuesday, 25 October 2016 at 19:16:07 UTC, Steven Schveighoffer wrote: On 10/25/16 12:57 PM, Dmitry Olshansky wrote: On 10/12/16 3:53 PM, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s

Re: Reducing the cost of autodecoding

2016-10-25 Thread Steven Schveighoffer via Digitalmars-d
On 10/25/16 12:57 PM, Dmitry Olshansky wrote: On 10/12/16 3:53 PM, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction

Re: Reducing the cost of autodecoding

2016-10-25 Thread Dmitry Olshansky via Digitalmars-d
On 10/12/16 3:53 PM, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger applications is not high. [snip]

Re: Reducing the cost of autodecoding

2016-10-17 Thread ketmar via Digitalmars-d
On Saturday, 15 October 2016 at 18:40:11 UTC, Uplink_Coder wrote: have you seen this[1] link? it is almost what you're doing, but with some more nice properties. [1] http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote: On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter wrote: This looks quite slow. We already have a correct version in utf.decodeImpl. The goal here was to find a small and fast alternative. I know but it has to be

Re: Reducing the cost of autodecoding

2016-10-16 Thread NoBrainer via Digitalmars-d
On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote: This looks quite slow. We already have a correct version in utf.decodeImpl. The goal here was to find a small and fast alternative. A: What's 2 + 2? B: 3 A: That's wrong. B: But incredibly fast.

Re: Reducing the cost of autodecoding

2016-10-16 Thread Uplink_Coder via Digitalmars-d
On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter wrote: Here my version. It's probably not the shortest (100 ligns of assembly with LDC) but it is correct and has following properties: - Performance proportional to the encoding length - Detects Invalid byte sequences - Detects

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
Here my version. It's probably not the shortest (100 ligns of assembly with LDC) but it is correct and has following properties: - Performance proportional to the encoding length - Detects Invalid byte sequences - Detects Overlong encodings - Detects Invalid code points I put the exception to

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Saturday, 15 October 2016 at 21:21:22 UTC, Stefan Koch wrote: On Saturday, 15 October 2016 at 19:42:03 UTC, Uplink_Coder wrote: On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter wrote: At least with that lookup table below, you can detect isolated continuation bytes (192 and

Re: Reducing the cost of autodecoding

2016-10-16 Thread Stefan Koch via Digitalmars-d
On Saturday, 15 October 2016 at 19:42:03 UTC, Uplink_Coder wrote: On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter wrote: At least with that lookup table below, you can detect isolated continuation bytes (192 and 193) and invalid codes (above 244). __gshared static immutable

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter wrote: Next step will be to loop for length 2,3,4, with or without your table. Ok now the looped version which doesn't need the lookup table. This one assembles in 72 lines of assembly (25 lines only for the exception code).

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter wrote: On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote: On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter wrote: This looks quite slow. We already have a correct version in utf.decodeImpl. The goal here

Re: Reducing the cost of autodecoding

2016-10-16 Thread Uplink_Coder via Digitalmars-d
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter wrote: At least with that lookup table below, you can detect isolated continuation bytes (192 and 193) and invalid codes (above 244). __gshared static immutable ubyte[] charWidthTab = [ 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,

Re: Reducing the cost of autodecoding

2016-10-16 Thread safety0ff via Digitalmars-d
On Saturday, 15 October 2016 at 19:00:12 UTC, Patrick Schluter wrote: Just a question. Do encoding errors not have to be detected or is validity of the string guaranteed? AFAIK they have to be detected, otherwise it would be a regression.

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter wrote: At least with that lookup table below, you can detect isolated continuation bytes (192 and 193) and invalid codes (above 244). 192 and 193 can never appear in a UTF-8 text, they are overlongs not continuation bytes.

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
At least with that lookup table below, you can detect isolated continuation bytes (192 and 193) and invalid codes (above 244). __gshared static immutable ubyte[] charWidthTab = [ 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Saturday, 15 October 2016 at 18:40:11 UTC, Uplink_Coder wrote: It can also be written like this producing smaller code. But it the cost of slower decoding. dchar myFront(ref char[] str) pure { dchar c = cast(dchar) str.ptr[0]; if (c & 128) { if (c & 64) {

Re: Reducing the cost of autodecoding

2016-10-16 Thread Uplink_Coder via Digitalmars-d
It can also be written like this producing smaller code. But it the cost of slower decoding. dchar myFront(ref char[] str) pure { dchar c = cast(dchar) str.ptr[0]; if (c & 128) { if (c & 64) { int idx = 0; int l = charWidthTab.ptr[c - 192];

Re: Reducing the cost of autodecoding

2016-10-16 Thread Andrei Alexandrescu via Digitalmars-d
On 10/15/2016 12:42 PM, Patrick Schluter wrote: Sorry if I do not participate on the testing as I don't have a proper compilation environment here at home. https://ldc.acomirei.ru Andrei

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
Oooops, I should not post after drinking 2 glasses of Châteauneuf-du-pape. That function does exactly the contrary of what popFront does. This one is conversion from dchar to multibyte not multibyte to dchar as you did. Sorry for the inconvenience.

Re: Reducing the cost of autodecoding

2016-10-16 Thread Patrick Schluter via Digitalmars-d
On Saturday, 15 October 2016 at 00:50:08 UTC, Stefan Koch wrote: On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote: On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote: Bad benchmark! Bad! -- Andrei Also, I suspect a benchmark with a larger loop body might not benefit

Re: Reducing the cost of autodecoding

2016-10-15 Thread safety0ff via Digitalmars-d
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote: On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote: Bad benchmark! Bad! -- Andrei Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one. I disagree in

Re: Reducing the cost of autodecoding

2016-10-15 Thread Stefan Koch via Digitalmars-d
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote: On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote: Bad benchmark! Bad! -- Andrei Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one. I disagree in

Re: Reducing the cost of autodecoding

2016-10-15 Thread Stefan Koch via Digitalmars-d
On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote: Bad benchmark! Bad! -- Andrei Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one. I disagree in longer loops code compactness is as important as in small ones.

Re: Reducing the cost of autodecoding

2016-10-13 Thread safety0ff via Digitalmars-d
On Thursday, 13 October 2016 at 01:36:44 UTC, Andrei Alexandrescu wrote: Oh ok, so it's that checksum in particular that got optimized. Bad benchmark! Bad! -- Andrei Also, I suspect a benchmark with a larger loop body might not benefit as significantly from branch hints as this one.

Re: Reducing the cost of autodecoding

2016-10-13 Thread safety0ff via Digitalmars-d
On Thursday, 13 October 2016 at 14:51:50 UTC, Kagamin wrote: On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote: Code: http://pastebin.com/CFCpUftW Line 25 doesn't look trusted: reads past the end of an empty string. Length is checked in the loop that calls this function. In

Re: Reducing the cost of autodecoding

2016-10-13 Thread Kagamin via Digitalmars-d
On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote: Code: http://pastebin.com/CFCpUftW Line 25 doesn't look trusted: reads past the end of an empty string.

Re: Reducing the cost of autodecoding

2016-10-13 Thread Jacob Carlborg via Digitalmars-d
On 2016-10-13 03:26, Andrei Alexandrescu wrote: Yah, shouldn't go in object.d as it's fairly niche. On the other hand defining a new module for two functions seems excessive unless we have a good theme. On the third hand we may find an existing module that's topically close. Thoughts? -- Andrei

Re: Reducing the cost of autodecoding

2016-10-13 Thread Johan Engelen via Digitalmars-d
On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:11 PM, Stefan Koch wrote: We should probably introduce a new module for stuff like this. object.d is already filled with too much unrelated things. Yah, shouldn't go in object.d as it's fairly niche. On

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 09:35 PM, Stefan Koch wrote: On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:41 PM, safety0ff wrote: On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:41 PM, safety0ff wrote: On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:11 PM, Stefan Koch wrote: We should probably introduce a new module for stuff like this. object.d is already filled with too much unrelated things. Yah, shouldn't go in object.d as it's fairly niche. On

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 08:41 PM, safety0ff wrote: On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without -mcpu=native: overhead 0.336s bytes0.610s without branch hints 0.852s

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 08:11 PM, Stefan Koch wrote: We should probably introduce a new module for stuff like this. object.d is already filled with too much unrelated things. Yah, shouldn't go in object.d as it's fairly niche. On the other hand defining a new module for two functions seems excessive

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without -mcpu=native: overhead 0.336s bytes0.610s without branch hints 0.852s code pasted 0.766s

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: Wait, so going through the bytes made almost no difference? Or did you subtract the overhead already? It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.)

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 23:59:15 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: I think we should define two aliases "likely" and "unlikely" with default implementations: bool likely(bool b) { return b; } bool unlikely(bool b) {

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: I think we should define two aliases "likely" and "unlikely" with default implementations: bool likely(bool b) { return b; } bool unlikely(bool b) { return b; } They'd go in druntime. Then implementers can hook them

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 04:02 PM, safety0ff wrote: On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote: Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss is negligible. --

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 20:07:19 UTC, Stefan Koch wrote: where did you apply the branch hints ? Code: http://pastebin.com/CFCpUftW

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 20:02:16 UTC, safety0ff wrote: On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote: Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote: Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss is negligible. -- Andrei My measurements: ldc -O3

Re: Reducing the cost of autodecoding

2016-10-12 Thread Johan Engelen via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: On my machine, with "ldc2 -release -O3 -enable-inlining" "-O3 -enable-inlining" is synonymous with "-O3" :-) With LDC 1.1.0-beta3, you can try with "-enable-cross-module-inlining". It won't cross-module inline

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 12:03 PM, Stefan Koch wrote: This will only work really efficiently with some state on the stack. Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss is negligible. -- Andrei

Re: Reducing the cost of autodecoding

2016-10-12 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 12 October 2016 at 16:07:39 UTC, Ilya Yaroshenko wrote: On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller

Re: Reducing the cost of autodecoding

2016-10-12 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger

Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger applications is not high. Now it's time to look at the end-to-end cost of