Re: Can you shrink it further?

2016-10-12 Thread Matthias Bentrup via Digitalmars-d
On Tuesday, 11 October 2016 at 15:01:47 UTC, Andrei Alexandrescu wrote: On 10/11/2016 10:49 AM, Matthias Bentrup wrote: void popFrontAsmIntel(ref char[] s) @trusted pure nothrow { immutable c = s[0]; if (c < 0x80) { s = s[1 .. $]; } else { uint l = void; asm pure nothrow @nogc

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: Here are three branch-less variants that use the sign instead of the carry bit. The last one is the fastest on my machine, although it mixes the rare error case and the common 1-byte case into one branch. void popFront1

Re: Any relation?

2016-10-12 Thread Seb via Digitalmars-d
On Tuesday, 11 October 2016 at 18:13:53 UTC, Andrei Alexandrescu wrote: http://indianautosblog.com/2016/10/most-powerful-suzuki-swift-produces-350-hp-25 -- Andrei Then maybe this isn't photoshopped: https://twitter.com/stamcd/status/742563964656062464

Re: Can you shrink it further?

2016-10-12 Thread Matthias Bentrup via Digitalmars-d
On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: [...] All three are slower than baseline, for my test-case. What did you test it against. The blns.txt file mentioned upthread.

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 10:15:17 UTC, Matthias Bentrup wrote: On Wednesday, 12 October 2016 at 09:23:53 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 08:56:59 UTC, Matthias Bentrup wrote: [...] All three are slower than baseline, for my test-case. What did you test it agai

Re: New encryption block...

2016-10-12 Thread Era Scarecrow via Digitalmars-d
On Sunday, 9 October 2016 at 20:33:29 UTC, Era Scarecrow wrote: Something coming to mind is the idea of making a small algorithm to be used with other already existing encryption functions to extend the blocksize of encryption with minimal complexity growth. For fun I'm experimenting with t

Re: Any relation?

2016-10-12 Thread Martin Krejcirik via Digitalmars-d
Then maybe this isn't photoshopped: https://twitter.com/stamcd/status/742563964656062464 Why would be ? It's a screenshot from Forza Motorsport game.

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produce the smallest code though.

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 04:56 AM, Matthias Bentrup wrote: void popFront1b(ref char[] s) @trusted pure nothrow { immutable c = cast(byte)s[0]; if (c >= -8) { s = s[1 .. $]; } else { uint i = 4 + (c + 64 >> 31) + (c + 32 >> 31) + (c + 16 >> 31); import std.algorithm; s = s[min(i, $) ..

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 05:23 AM, Stefan Koch wrote: All three are slower than baseline, for my test-case. What did you test it against. I'd say: (a) test for speed of ASCII-only text; (b) make it small. That's all we need. Nobody worries about 10-20% in multibyte-heavy text. -- Andrei

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 06:56 AM, Stefan Koch wrote: I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produce the smallest code though. Nice. I like that the table is NOT looked up o

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (small instructions). If the variable char_

Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger applications is not high. Now it's time to look at the end-to-end cost of autod

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: On 10/12/2016 06:56 AM, Stefan Koch wrote: I just confirmed that branching version is faster then table-lookup. please test it our for yourself http://paste.ofcode.org/3CpieAhkrTYEcSncbPKbrj The table-lookup does produc

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 09:39 AM, Stefan Koch wrote: On Wednesday, 12 October 2016 at 13:32:45 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 12:46:50 UTC, Andrei Alexandrescu wrote: In the second case, the compiler generates an inc for bumping the pointer and a dec for decreasing the length (

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote: On 10/12/2016 09:39 AM, Stefan Koch wrote: Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR with the table version. -- Andrei I measur

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 10:39 AM, Stefan Koch wrote: On Wednesday, 12 October 2016 at 14:12:30 UTC, Andrei Alexandrescu wrote: On 10/12/2016 09:39 AM, Stefan Koch wrote: Thanks! I'd say make sure there is exactly 0% loss on performance compared to the popFront in the ASCII case, and if so make a PR wi

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger applic

Re: Reducing the cost of autodecoding

2016-10-12 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in larger applic

Re: Reducing the cost of autodecoding

2016-10-12 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 12 October 2016 at 16:07:39 UTC, Ilya Yaroshenko wrote: On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: So we've had a good run with making popFront smaller. In ASCII microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller fun

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 12:03 PM, Stefan Koch wrote: This will only work really efficiently with some state on the stack. Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss is negligible. -- Andrei

Re: Can you shrink it further?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 14:46:32 UTC, Andrei Alexandrescu wrote: No need. 1% for dmd is negligible. 25% would raise an eyebrow. -- Andrei Alright then PR: https://github.com/dlang/phobos/pull/4849

Re: Any front end experts n da house?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 16:27:05 UTC, Andrei Alexandrescu wrote: So it would be great to get the super annoying https://issues.dlang.org/show_bug.cgi?id=259 to a conclusion, and it seems the similarly annoying https://issues.dlang.org/show_bug.cgi?id=14835 is in the way. If anyone wo

Re: Any front end experts n da house?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 12:31 PM, Stefan Koch wrote: I can take a look at 259. 14835 is nothing trivial though. My understanding is Thomas has an attack on 259 once a solution to 14835 is up. -- Andrei

Any front end experts n da house?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
So it would be great to get the super annoying https://issues.dlang.org/show_bug.cgi?id=259 to a conclusion, and it seems the similarly annoying https://issues.dlang.org/show_bug.cgi?id=14835 is in the way. If anyone would like to look into the latter that would be great. Good regression test

Re: Can you shrink it further?

2016-10-12 Thread safety0ff via Digitalmars-d
My current favorites: void popFront(ref char[] s) @trusted pure nothrow { immutable byte c = s[0]; if (c >= -2) { s = s.ptr[1 .. s.length]; } else { import core.bitop; size_t i = 7u - bsr(~c); import std.algorithm; s = s.ptr[min(i, s.length) .. s.length]; } } I also e

Re: Can you shrink it further?

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote: [Snip] Didn't see the LUT implementation, nvm!

Re: Can you shrink it further?

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 01:05 PM, safety0ff wrote: On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff wrote: [Snip] Didn't see the LUT implementation, nvm! Yah, that's pretty clever. Better yet, I suspect we can reuse the look-up table for front() as well. -- Andrei

Re: Reducing the cost of autodecoding

2016-10-12 Thread Johan Engelen via Digitalmars-d
On Wednesday, 12 October 2016 at 13:53:03 UTC, Andrei Alexandrescu wrote: On my machine, with "ldc2 -release -O3 -enable-inlining" "-O3 -enable-inlining" is synonymous with "-O3" :-) With LDC 1.1.0-beta3, you can try with "-enable-cross-module-inlining". It won't cross-module inline everyt

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote: Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss is negligible. -- Andrei My measurements: ldc -O3 -boundschec

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 20:02:16 UTC, safety0ff wrote: On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote: Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 20:07:19 UTC, Stefan Koch wrote: where did you apply the branch hints ? Code: http://pastebin.com/CFCpUftW

Re: Any front end experts n da house?

2016-10-12 Thread tsbockman via Digitalmars-d
On Wednesday, 12 October 2016 at 16:36:32 UTC, Andrei Alexandrescu wrote: On 10/12/2016 12:31 PM, Stefan Koch wrote: I can take a look at 259. 14835 is nothing trivial though. My understanding is Thomas has an attack on 259 once a solution to 14835 is up. -- Andrei Yes. The path to fix 259

Rust-style Ownership and Borrowing In D!

2016-10-12 Thread Nordlöw via Digitalmars-d
Hi! I've recently started a new employment, and with that new collegues and new language discussions/wars ;) So there's this language Rust. And it provides some pretty amazing safety guarantees when it to memory management, algorithm correctness (preventing iterator invalidation) in both sin

Re: Any front end experts n da house?

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 22:16:38 UTC, tsbockman wrote: On Wednesday, 12 October 2016 at 16:36:32 UTC, Andrei Alexandrescu wrote: On 10/12/2016 12:31 PM, Stefan Koch wrote: I can take a look at 259. 14835 is nothing trivial though. My understanding is Thomas has an attack on 259 once

Re: Any front end experts n da house?

2016-10-12 Thread tsbockman via Digitalmars-d
On Wednesday, 12 October 2016 at 22:38:33 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 22:16:38 UTC, tsbockman wrote: Yes. The path to fix 259 is clear, and Lionello Lunesu and myself have already done most of the work. 14835 is a blocker due to the nature of the solution that Wal

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 04:02 PM, safety0ff wrote: On Wednesday, 12 October 2016 at 16:24:19 UTC, Andrei Alexandrescu wrote: Remember the ASCII part is the bothersome one. There's only two comparisons, all with 100% predictability. We should be able to arrange matters so the loss is negligible. -- Andrei

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: I think we should define two aliases "likely" and "unlikely" with default implementations: bool likely(bool b) { return b; } bool unlikely(bool b) { return b; } They'd go in druntime. Then implementers can hook them in

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Wednesday, 12 October 2016 at 23:59:15 UTC, Stefan Koch wrote: On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: I think we should define two aliases "likely" and "unlikely" with default implementations: bool likely(bool b) { return b; } bool unlikely(bool b) { ret

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Wednesday, 12 October 2016 at 23:47:45 UTC, Andrei Alexandrescu wrote: Wait, so going through the bytes made almost no difference? Or did you subtract the overhead already? It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.)

Re: Reducing the cost of autodecoding

2016-10-12 Thread safety0ff via Digitalmars-d
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without -mcpu=native: overhead 0.336s bytes0.610s without branch hints 0.852s code pasted 0.766s

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 08:11 PM, Stefan Koch wrote: We should probably introduce a new module for stuff like this. object.d is already filled with too much unrelated things. Yah, shouldn't go in object.d as it's fairly niche. On the other hand defining a new module for two functions seems excessive unl

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 08:41 PM, safety0ff wrote: On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without -mcpu=native: overhead 0.336s bytes0.610s without branch hints 0.852s co

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Thursday, 13 October 2016 at 01:26:17 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:11 PM, Stefan Koch wrote: We should probably introduce a new module for stuff like this. object.d is already filled with too much unrelated things. Yah, shouldn't go in object.d as it's fairly niche. On t

Re: Reducing the cost of autodecoding

2016-10-12 Thread Stefan Koch via Digitalmars-d
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:41 PM, safety0ff wrote: On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vpmovzxbq & vpaddq.) Measurements without

Re: Reducing the cost of autodecoding

2016-10-12 Thread Andrei Alexandrescu via Digitalmars-d
On 10/12/2016 09:35 PM, Stefan Koch wrote: On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote: On 10/12/2016 08:41 PM, safety0ff wrote: On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: It made little difference: LDC compiled into AVX2 vectorized addition (vp

Re: Any relation?

2016-10-12 Thread Neica Dorin via Digitalmars-d
On Tuesday, 11 October 2016 at 18:13:53 UTC, Andrei Alexandrescu wrote: http://indianautosblog.com/2016/10/most-powerful-suzuki-swift-produces-350-hp-25 -- Andrei Buna ziua Stimate Domnule Alexandrescu am studiat putin noul limbaj dezvoltat de dumneavoastra. Am incercat sa rulez cateva pr