On Tuesday, 25 October 2016 at 21:46:30 UTC, safety0ff wrote:
P.S. I am aware that this pessimises popFront for code which
only counts codepoints without inspecting them.
Unfortunately it also changes the API of popFront to throw on
invalid characters.
So the example would need to be
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
Now it's time to look at the end-to-end cost of autodecoding.
Some food for thought:
- front necessarily needs to compute the number of bytes to
advance.
- We can't change the API to share data between front and
On 10/25/16 3:32 PM, Stefam Koch wrote:
On Tuesday, 25 October 2016 at 19:16:07 UTC, Steven Schveighoffer wrote:
Should behave better on 32-bit system.
You could also store 3-bits to avoid extra addition.
The whole point of a LUT to begin with is to reduce instructions.
Yes, I know. But
On Tuesday, 25 October 2016 at 19:16:07 UTC, Steven Schveighoffer
wrote:
On 10/25/16 12:57 PM, Dmitry Olshansky wrote:
On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from
s
On 10/25/16 12:57 PM, Dmitry Olshansky wrote:
On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
$]. Smaller functions make sure that the impact on instruction
On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
$]. Smaller functions make sure that the impact on instruction cache in
larger applications is not high.
[snip]
On Saturday, 15 October 2016 at 18:40:11 UTC, Uplink_Coder wrote:
have you seen this[1] link? it is almost what you're doing, but
with some more nice properties.
[1] http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote:
On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter
wrote:
This looks quite slow.
We already have a correct version in utf.decodeImpl.
The goal here was to find a small and fast alternative.
I know but it has to be
On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote:
This looks quite slow.
We already have a correct version in utf.decodeImpl.
The goal here was to find a small and fast alternative.
A: What's 2 + 2?
B: 3
A: That's wrong.
B: But incredibly fast.
On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter
wrote:
Here my version. It's probably not the shortest (100 ligns of
assembly with LDC) but it is correct and has following
properties:
- Performance proportional to the encoding length
- Detects Invalid byte sequences
- Detects
Here my version. It's probably not the shortest (100 ligns of
assembly with LDC) but it is correct and has following properties:
- Performance proportional to the encoding length
- Detects Invalid byte sequences
- Detects Overlong encodings
- Detects Invalid code points
I put the exception to
On Saturday, 15 October 2016 at 21:21:22 UTC, Stefan Koch wrote:
On Saturday, 15 October 2016 at 19:42:03 UTC, Uplink_Coder
wrote:
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter
wrote:
At least with that lookup table below, you can detect
isolated continuation bytes (192 and
On Saturday, 15 October 2016 at 19:42:03 UTC, Uplink_Coder wrote:
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter
wrote:
At least with that lookup table below, you can detect isolated
continuation bytes (192 and 193) and invalid codes (above 244).
__gshared static immutable
On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter
wrote:
Next step will be to loop for length 2,3,4, with or without
your table.
Ok now the looped version which doesn't need the lookup table.
This one assembles in 72 lines of assembly (25 lines only for the
exception code).
On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter
wrote:
On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote:
On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter
wrote:
This looks quite slow.
We already have a correct version in utf.decodeImpl.
The goal here
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter
wrote:
At least with that lookup table below, you can detect isolated
continuation bytes (192 and 193) and invalid codes (above 244).
__gshared static immutable ubyte[] charWidthTab = [
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
On Saturday, 15 October 2016 at 19:00:12 UTC, Patrick Schluter
wrote:
Just a question. Do encoding errors not have to be detected or
is validity of the string guaranteed?
AFAIK they have to be detected, otherwise it would be a
regression.
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter
wrote:
At least with that lookup table below, you can detect isolated
continuation bytes (192 and 193) and invalid codes (above 244).
192 and 193 can never appear in a UTF-8 text, they are overlongs
not continuation bytes.
At least with that lookup table below, you can detect isolated
continuation bytes (192 and 193) and invalid codes (above 244).
__gshared static immutable ubyte[] charWidthTab = [
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
On Saturday, 15 October 2016 at 18:40:11 UTC, Uplink_Coder wrote:
It can also be written like this producing smaller code.
But it the cost of slower decoding.
dchar myFront(ref char[] str) pure
{
dchar c = cast(dchar) str.ptr[0];
if (c & 128)
{
if (c & 64)
{
It can also be written like this producing smaller code.
But it the cost of slower decoding.
dchar myFront(ref char[] str) pure
{
dchar c = cast(dchar) str.ptr[0];
if (c & 128)
{
if (c & 64)
{
int idx = 0;
int l = charWidthTab.ptr[c - 192];
On 10/15/2016 12:42 PM, Patrick Schluter wrote:
Sorry if I do not participate on the testing as I don't have a proper
compilation environment here at home.
https://ldc.acomirei.ru
Andrei
Oooops, I should not post after drinking 2 glasses of
Châteauneuf-du-pape. That function does exactly the contrary of
what popFront does. This one is conversion from dchar to
multibyte not multibyte to dchar as you did.
Sorry for the inconvenience.
On Saturday, 15 October 2016 at 00:50:08 UTC, Stefan Koch wrote:
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
Bad benchmark! Bad! -- Andrei
Also, I suspect a benchmark with a larger loop body might not
benefit
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
Bad benchmark! Bad! -- Andrei
Also, I suspect a benchmark with a larger loop body might not
benefit as significantly from branch hints as this one.
I disagree in
On Friday, 14 October 2016 at 20:47:39 UTC, Stefan Koch wrote:
On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
Bad benchmark! Bad! -- Andrei
Also, I suspect a benchmark with a larger loop body might not
benefit as significantly from branch hints as this one.
I disagree in
On Thursday, 13 October 2016 at 21:49:22 UTC, safety0ff wrote:
Bad benchmark! Bad! -- Andrei
Also, I suspect a benchmark with a larger loop body might not
benefit as significantly from branch hints as this one.
I disagree in longer loops code compactness is as important as in
small ones.
On Thursday, 13 October 2016 at 01:36:44 UTC, Andrei Alexandrescu
wrote:
Oh ok, so it's that checksum in particular that got optimized.
Bad benchmark! Bad! -- Andrei
Also, I suspect a benchmark with a larger loop body might not
benefit as significantly from branch hints as this one.
On Thursday, 13 October 2016 at 14:51:50 UTC, Kagamin wrote:
On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote:
Code: http://pastebin.com/CFCpUftW
Line 25 doesn't look trusted: reads past the end of an empty
string.
Length is checked in the loop that calls this function.
In
On Wednesday, 12 October 2016 at 20:24:54 UTC, safety0ff wrote:
Code: http://pastebin.com/CFCpUftW
Line 25 doesn't look trusted: reads past the end of an empty
string.
On 2016-10-13 03:26, Andrei Alexandrescu wrote:
Yah, shouldn't go in object.d as it's fairly niche. On the other hand
defining a new module for two functions seems excessive unless we have a
good theme. On the third hand we may find an existing module that's
topically close. Thoughts? -- Andrei
On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu
wrote:
On 10/12/2016 08:11 PM, Stefan Koch wrote:
We should probably introduce a new module for stuff like this.
object.d is already filled with too much unrelated things.
Yah, shouldn't go in object.d as it's fairly niche. On
On 10/12/2016 09:35 PM, Stefan Koch wrote:
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote:
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized addition
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu
wrote:
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized
addition
(vpmovzxbq & vpaddq.)
Measurements without
On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu
wrote:
On 10/12/2016 08:11 PM, Stefan Koch wrote:
We should probably introduce a new module for stuff like this.
object.d is already filled with too much unrelated things.
Yah, shouldn't go in object.d as it's fairly niche. On
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized addition
(vpmovzxbq & vpaddq.)
Measurements without -mcpu=native:
overhead 0.336s
bytes0.610s
without branch hints 0.852s
On 10/12/2016 08:11 PM, Stefan Koch wrote:
We should probably introduce a new module for stuff like this.
object.d is already filled with too much unrelated things.
Yah, shouldn't go in object.d as it's fairly niche. On the other hand
defining a new module for two functions seems excessive
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized
addition (vpmovzxbq & vpaddq.)
Measurements without -mcpu=native:
overhead 0.336s
bytes0.610s
without branch hints 0.852s
code pasted 0.766s
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei
Alexandrescu wrote:
Wait, so going through the bytes made almost no difference? Or
did you subtract the overhead already?
It made little difference: LDC compiled into AVX2 vectorized
addition (vpmovzxbq & vpaddq.)
On Wednesday, 12 October 2016 at 23:59:15 UTC, Stefan Koch wrote:
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei
Alexandrescu wrote:
I think we should define two aliases "likely" and "unlikely"
with default implementations:
bool likely(bool b) { return b; }
bool unlikely(bool b) {
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei
Alexandrescu wrote:
I think we should define two aliases "likely" and "unlikely"
with default implementations:
bool likely(bool b) { return b; }
bool unlikely(bool b) { return b; }
They'd go in druntime. Then implementers can hook them
On 10/12/2016 04:02 PM, safety0ff wrote:
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote:
Remember the ASCII part is the bothersome one. There's only two
comparisons, all with 100% predictability. We should be able to
arrange matters so the loss is negligible. --
On Wednesday, 12 October 2016 at 20:07:19 UTC, Stefan Koch wrote:
where did you apply the branch hints ?
Code: http://pastebin.com/CFCpUftW
On Wednesday, 12 October 2016 at 20:02:16 UTC, safety0ff wrote:
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei
Alexandrescu wrote:
Remember the ASCII part is the bothersome one. There's only
two comparisons, all with 100% predictability. We should be
able to arrange matters so the
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei
Alexandrescu wrote:
Remember the ASCII part is the bothersome one. There's only two
comparisons, all with 100% predictability. We should be able to
arrange matters so the loss is negligible. -- Andrei
My measurements:
ldc -O3
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
On my machine, with "ldc2 -release -O3 -enable-inlining"
"-O3 -enable-inlining" is synonymous with "-O3" :-)
With LDC 1.1.0-beta3, you can try with
"-enable-cross-module-inlining". It won't cross-module inline
On 10/12/2016 12:03 PM, Stefan Koch wrote:
This will only work really efficiently with some state on the stack.
Remember the ASCII part is the bothersome one. There's only two
comparisons, all with 100% predictability. We should be able to arrange
matters so the loss is negligible. -- Andrei
On Wednesday, 12 October 2016 at 16:07:39 UTC, Ilya Yaroshenko
wrote:
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from
s = s[1 .. $]. Smaller
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s
= s[1 .. $]. Smaller functions make sure that the impact on
instruction cache in larger
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei
Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s
= s[1 .. $]. Smaller functions make sure that the impact on
instruction cache in larger
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
$]. Smaller functions make sure that the impact on instruction cache in
larger applications is not high.
Now it's time to look at the end-to-end cost of
51 matches
Mail list logo