Re: faster splitter

2016-07-11 Thread Henrique bucher via Digitalmars-d
On Monday, 30 May 2016 at 18:20:39 UTC, Andrei Alexandrescu wrote: On 05/30/2016 05:31 AM, qznc wrote: On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote: worthwhile to use word loads [0] instead. Really fancy would be SSE. I wrote a splitter in SSE4.2 some time ago as acontribution to a

Re: faster splitter

2016-06-02 Thread qznc via Digitalmars-d
On Tuesday, 31 May 2016 at 22:50:37 UTC, David Nadlinger wrote: Another thing that might be interesting to do (now that you have a "clever" baseline) is to start counting cycles and make some comparisons against manual asm/intrinsics implementations. For short(-ish) needles, PCMPESTRI is

Re: faster splitter

2016-06-02 Thread qznc via Digitalmars-d
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote: On 05/31/2016 04:18 PM, Chris wrote: I actually thought that dmd didn't place `computeSkip` inside of the loop. This begs the question if it should be moved to the loop, in case we use it in Phobos, to make sure that it is as

Re: faster splitter

2016-06-02 Thread Chris via Digitalmars-d
On Wednesday, 1 June 2016 at 14:32:38 UTC, Chris wrote: For fun, I've written a search function that alternates between the beginning and the end of a string. I'm basically using Andrei's search algorithm for this. It is, of course only useful, if `needle` is expected to be either towards the

Re: faster splitter

2016-06-01 Thread Chris via Digitalmars-d
On Wednesday, 1 June 2016 at 13:47:10 UTC, Patrick Schluter wrote: What I wanted to say, is that in real life, the input of the search routine is very often run-time user provided data. Think of search box in browsers and apps, command line parameter à la grep, etc. The "string" search

Re: faster splitter

2016-06-01 Thread Patrick Schluter via Digitalmars-d
On Wednesday, 1 June 2016 at 12:41:19 UTC, Seb wrote: On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: There is a special version of find for searching a single char in a string. Using a one-letter needle string is more

Re: faster splitter

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d
On 06/01/2016 08:41 AM, Seb wrote: On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake

Re: faster splitter

2016-06-01 Thread Chris via Digitalmars-d
On Wednesday, 1 June 2016 at 12:41:19 UTC, Seb wrote: On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: There is a special version of find for searching a single char in a string. Using a one-letter needle string is more

Re: faster splitter

2016-06-01 Thread Seb via Digitalmars-d
On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote: On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for. At

Re: faster splitter

2016-06-01 Thread Patrick Schluter via Digitalmars-d
On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote: There is a special version of find for searching a single char in a string. Using a one-letter needle string is more like a user mistake than something to optimize for. At compile time you may not know the length of the needle, like in the

Re: faster splitter

2016-06-01 Thread Chris via Digitalmars-d
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote: You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei I've added it as `Andrei3`. It runs faster with dmd, but it's not as good with ldc. Seems like ldc

Re: faster splitter

2016-05-31 Thread David Nadlinger via Digitalmars-d
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu wrote: You may want to then try https://dpaste.dzfl.pl/392710b765a9, which generates inline code on all compilers. -- Andrei In general, it might be beneficial to use ldc.intrinsics.llvm_expect (cf. __builtin_expect) for things like

Re: faster splitter

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 04:18 PM, Chris wrote: I actually thought that dmd didn't place `computeSkip` inside of the loop. This begs the question if it should be moved to the loop, in case we use it in Phobos, to make sure that it is as fast as possible even with dmd. However, I like it the way it is now.

Re: faster splitter

2016-05-31 Thread Chris via Digitalmars-d
On Tuesday, 31 May 2016 at 19:59:50 UTC, qznc wrote: On Tuesday, 31 May 2016 at 19:29:25 UTC, Chris wrote: Would it speed things up even more, if we put the function `computeSkip` into the loop or is this done automatically by the compiler? LDC inlines it. DMD does not. More numbers:

Re: faster splitter

2016-05-31 Thread qznc via Digitalmars-d
On Tuesday, 31 May 2016 at 19:29:25 UTC, Chris wrote: Would it speed things up even more, if we put the function `computeSkip` into the loop or is this done automatically by the compiler? LDC inlines it. DMD does not. More numbers: ./benchmark.ldc Search in Alice in Wonderland std:

Re: faster splitter

2016-05-31 Thread Chris via Digitalmars-d
On Tuesday, 31 May 2016 at 18:56:14 UTC, qznc wrote: The mistake is to split on "," instead of ','. The slow splitter at the start of this thread searches for "\r\n". Your lazy-skip algorithm looks great! ./benchmark.ldc Search in Alice in Wonderland std: 168 ±6 +29 ( 107) -3

Re: faster splitter

2016-05-31 Thread qznc via Digitalmars-d
On Tuesday, 31 May 2016 at 18:31:47 UTC, Andrei Alexandrescu wrote: On 5/31/16 1:54 PM, qznc wrote: Using a one-letter needle string is more like a user mistake than something to optimize for. How is splitting on ',' a user mistake? -- Andrei The mistake is to split on "," instead of ','.

Re: faster splitter

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 5/31/16 1:54 PM, qznc wrote: On Tuesday, 31 May 2016 at 01:55:16 UTC, Andrei Alexandrescu wrote: I agree it's difficult to characterize the behavior of substring search with one number. There are many dimensions of variation. (But there's no reason for an emotional response.) A few possible

Re: faster splitter

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 5/31/16 1:54 PM, qznc wrote: Using a one-letter needle string is more like a user mistake than something to optimize for. How is splitting on ',' a user mistake? -- Andrei

Re: faster splitter

2016-05-31 Thread qznc via Digitalmars-d
On Tuesday, 31 May 2016 at 01:55:16 UTC, Andrei Alexandrescu wrote: I agree it's difficult to characterize the behavior of substring search with one number. There are many dimensions of variation. (But there's no reason for an emotional response.) A few possible baselines come to mind: *

Re: faster splitter

2016-05-31 Thread qznc via Digitalmars-d
On Tuesday, 31 May 2016 at 17:31:29 UTC, Wyatt wrote: qznc wins DMD (and is faster than the LDC's best? Careful! These are not absolute numbers, but relative slowdowns. You cannot compare the numbers between LDC and DMD.

Re: faster splitter

2016-05-31 Thread Wyatt via Digitalmars-d
On Tuesday, 31 May 2016 at 08:43:59 UTC, Chris wrote: On Monday, 30 May 2016 at 22:16:27 UTC, qznc wrote: And Desktop: ./benchmark.ldc std: 129 ±24+40 (3121) -17 (6767) manual: 129 ±31+59 (2668) -21 (7244) qznc: 112 ±14+30 (2542) -9 (7312) Chris: 134 ±33

Re: faster splitter

2016-05-31 Thread Chris via Digitalmars-d
On Monday, 30 May 2016 at 22:16:27 UTC, qznc wrote: And Desktop: ./benchmark.ldc std: 129 ±24+40 (3121) -17 (6767) manual: 129 ±31+59 (2668) -21 (7244) qznc: 112 ±14+30 (2542) -9 (7312) Chris: 134 ±33+58 (2835) -23 (7068) Andrei: 123 ±27+53

Re: faster splitter

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 06:16 PM, qznc wrote: What function call should be replaced with inline code? The call to computeSkip. Overall, I'm very confused and displeased by the benchmark right now. I agree it's difficult to characterize the behavior of substring search with one number. There are

Re: faster splitter

2016-05-30 Thread qznc via Digitalmars-d
On Monday, 30 May 2016 at 20:08:46 UTC, Andrei Alexandrescu wrote: On 05/30/2016 04:00 PM, Chris wrote: ./benchmark.dmd std: 178 ±31+36 (4475) -29 (5344) manual: 167 ±46+82 (2883) -32 (7054) qznc: 114 ±7 +18 (1990) -5 (7288) Chris: 228 ±49+82 (3050)

Re: faster splitter

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 04:00 PM, Chris wrote: ./benchmark.dmd std: 178 ±31+36 (4475) -29 (5344) manual: 167 ±46+82 (2883) -32 (7054) qznc: 114 ±7 +18 (1990) -5 (7288) Chris: 228 ±49+82 (3050) -35 (6780) Andrei: 103 ±5 +47 ( 658) -2 (9295) (avg

Re: faster splitter

2016-05-30 Thread Chris via Digitalmars-d
On Monday, 30 May 2016 at 18:57:15 UTC, Chris wrote: On Monday, 30 May 2016 at 18:20:39 UTC, Andrei Alexandrescu wrote: Please throw this hat into the ring as well: it should improve average search on large vocabulary dramatically. https://dpaste.dzfl.pl/dc8dc6e1eb53 It uses a BM-inspired

Re: faster splitter

2016-05-30 Thread Chris via Digitalmars-d
On Monday, 30 May 2016 at 18:20:39 UTC, Andrei Alexandrescu wrote: Please throw this hat into the ring as well: it should improve average search on large vocabulary dramatically. https://dpaste.dzfl.pl/dc8dc6e1eb53 It uses a BM-inspired trick - once the last characters matched, if the

Re: faster splitter

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 05:31 AM, qznc wrote: On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote: When looking at the assembly I don't like the single-byte loads. Since string (ubyte[] here) is of extraordinary importance, it should be worthwhile to use word loads [0] instead. Really fancy would be SSE.

Re: faster splitter

2016-05-30 Thread qznc via Digitalmars-d
On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote: When looking at the assembly I don't like the single-byte loads. Since string (ubyte[] here) is of extraordinary importance, it should be worthwhile to use word loads [0] instead. Really fancy would be SSE. So far, the results look

Re: faster splitter

2016-05-30 Thread Chris via Digitalmars-d
It's weird, on my machine, my find function is consistently faster than `manual find` LDC: std find: 137 ±22+33 (3580) -17 (6251) manual find: 137 ±32+53 (3105) -23 (6834) qznc find: 106 ±8 +17 (2608) -5 (7195) Chris find: 132 ±32+58 (2803) -23 (7131) Andrei

Re: faster splitter

2016-05-29 Thread Jon Degenhardt via Digitalmars-d
On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote: On Sunday, 29 May 2016 at 18:50:40 UTC, Jon Degenhardt wrote: A minor thing - you might consider also calculating the median and median version of MAD (median of absolute deviations from the median). The reason is that benchmarks often have

Re: faster splitter

2016-05-29 Thread qznc via Digitalmars-d
On Sunday, 29 May 2016 at 18:50:40 UTC, Jon Degenhardt wrote: A minor thing - you might consider also calculating the median and median version of MAD (median of absolute deviations from the median). The reason is that benchmarks often have outliers in the max time dimension, median will do a

Re: faster splitter

2016-05-29 Thread Jon Degenhardt via Digitalmars-d
On Sunday, 29 May 2016 at 12:22:23 UTC, qznc wrote: I played around with the benchmark. Some more numbers: The mean slowdown is 114, which means 14% slower than the fastest one. The mean absolute deviation (MAD) is 23. More precisely, the mean deviation above the mean slowdown of 103 is 100

Re: faster splitter

2016-05-29 Thread Chris via Digitalmars-d
On Sunday, 29 May 2016 at 12:22:23 UTC, qznc wrote: I played around with the benchmark. Some more numbers: $ make ldc ldmd2 -O -release -inline -noboundscheck *.d -ofbenchmark.ldc ./benchmark.ldc E: wrong result with Chris find E: wrong result with Chris find E: wrong result with Chris find

Re: faster splitter

2016-05-29 Thread qznc via Digitalmars-d
I played around with the benchmark. Some more numbers: $ make ldc ldmd2 -O -release -inline -noboundscheck *.d -ofbenchmark.ldc ./benchmark.ldc E: wrong result with Chris find E: wrong result with Chris find E: wrong result with Chris find std find: 153 ±25+66 (1934) -15 (7860) manual

Re: faster splitter

2016-05-28 Thread qznc via Digitalmars-d
On Saturday, 28 May 2016 at 14:18:10 UTC, Chris wrote: I used dmd, because I don't have ldc on my laptop. qznc's find is clearly the winner. DMD performance feels flaky to me. If performance is important, you should use ldc or gdc. Alternatively, you are Walter Bright and simply optimize

Re: faster splitter

2016-05-28 Thread qznc via Digitalmars-d
On Saturday, 28 May 2016 at 12:47:59 UTC, Andrei Alexandrescu wrote: On 5/28/16 6:56 AM, qznc wrote: The sentinel value is `needleBack+1`, but range elements need not support addition. Finding a sentinel is hard and most certainly requires more assumptions about the ranges. No need for a

Re: faster splitter

2016-05-28 Thread Chris via Digitalmars-d
On Saturday, 28 May 2016 at 12:47:59 UTC, Andrei Alexandrescu wrote: On 5/28/16 6:56 AM, qznc wrote: The sentinel value is `needleBack+1`, but range elements need not support addition. Finding a sentinel is hard and most certainly requires more assumptions about the ranges. No need for a

Re: faster splitter

2016-05-28 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 21:31:48 UTC, David Nadlinger wrote: Just about the only reason I could think of for this to happen is if the compiler fails to inline the range primitives from std.array. Otherwise, the loops should be pretty much equivalent to LLVM's optimiser. This is so

Re: faster splitter

2016-05-28 Thread Andrei Alexandrescu via Digitalmars-d
On 5/28/16 6:56 AM, qznc wrote: The sentinel value is `needleBack+1`, but range elements need not support addition. Finding a sentinel is hard and most certainly requires more assumptions about the ranges. No need for a sentinel really so long as you first search for the last element of the

Re: faster splitter

2016-05-28 Thread qznc via Digitalmars-d
On Saturday, 28 May 2016 at 01:30:11 UTC, Andrei Alexandrescu wrote: On 05/27/2016 06:19 PM, qznc wrote: manual find: 118 ±24 qznc find: 109 ±13 <--- using the sentinel trick Chris find: 142 ±27 It is normal that the numbers of the other tests change, since those are relative to the

Re: faster splitter

2016-05-28 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 20:39:03 UTC, qznc wrote: Now added Chris' algo to the benchmark: std find:155 ±33 manual find: 112 ±19 qznc find: 122 ±18 <--- improved find Chris find: 133 ±20 <--- findStringS_ Did you fix the bugs in my algorithm? S

Re: faster splitter

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 05/27/2016 06:19 PM, qznc wrote: manual find: 118 ±24 qznc find: 109 ±13 <--- using the sentinel trick Chris find: 142 ±27 It is normal that the numbers of the other tests change, since those are relative to the fastest one in each run. When qznc find 'wins' more often, the others get

Re: faster splitter

2016-05-27 Thread qznc via Digitalmars-d
On Friday, 27 May 2016 at 18:21:06 UTC, Andrei Alexandrescu wrote: What you want to do here for indexed access is to match the last element first. If no match, continue etc. If there's a match, enter an inner loop where you don't need to check for the end of the haystack. -- Andrei Another

Re: faster splitter

2016-05-27 Thread David Nadlinger via Digitalmars-d
On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: I have to correct myself. A manual loop is faster, I couldn't believe it either, so I benchmarked the same function with `foreach` and `for`. Turns out that `for` is _way_ faster. Just about the only reason I could think of for this to

Re: faster splitter

2016-05-27 Thread qznc via Digitalmars-d
On Friday, 27 May 2016 at 20:50:52 UTC, Andrei Alexandrescu wrote: On 05/27/2016 04:39 PM, qznc wrote: Now added Chris' algo to the benchmark: std find:155 ±33 manual find: 112 ±19 qznc find: 122 ±18 <--- improved find Chris find: 133 ±20 <--- findStringS_ Does any of these feature

Re: faster splitter

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 05/27/2016 04:39 PM, qznc wrote: Now added Chris' algo to the benchmark: std find:155 ±33 manual find: 112 ±19 qznc find: 122 ±18 <--- improved find Chris find: 133 ±20 <--- findStringS_ Does any of these feature simultaneously: * check the last element first * consequently stop

Re: faster splitter

2016-05-27 Thread qznc via Digitalmars-d
Now added Chris' algo to the benchmark: std find:155 ±33 manual find: 112 ±19 qznc find: 122 ±18 <--- improved find Chris find: 133 ±20 <--- findStringS_

Re: faster splitter

2016-05-27 Thread qznc via Digitalmars-d
On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote: The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast. std findtook 12573666 manual find took 7418454 my std find took 6903854 <=== findStringS

Re: faster splitter

2016-05-27 Thread Patrick Schluter via Digitalmars-d
On Friday, 27 May 2016 at 19:17:44 UTC, Chris wrote: Oops, I've been found out. :-) Thanks. You're right of course, and I've already noticed that bug as it fails on not found. I got the bounds wrong. I had the same "bug" when I wrote my search function on the project at work. I also found

Re: faster splitter

2016-05-27 Thread Chris via Digitalmars-d
Oops, I've been found out. :-) Thanks. You're right of course, and I've already noticed that bug as it fails on not found. I got the bounds wrong.

Re: faster splitter

2016-05-27 Thread Patrick Schluter via Digitalmars-d
On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote: On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: This outperforms both `manual_find` and the improved `std find` string findStringS_Manual(string haystack, string needle) { if (needle.length > haystack.length)

Re: faster splitter

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 10:41 AM, Chris wrote: The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast. What you want to do here for indexed access is to match the last element first. If no match, continue etc. If there's a match, enter an

Re: faster splitter

2016-05-27 Thread qznc via Digitalmars-d
On Friday, 27 May 2016 at 14:41:29 UTC, Chris wrote: The improved `std find` comes very close to the manual function above now, sometimes it's even faster or at least as fast. I think it really depends on the use case. The manual algorithm is really naive and std-find is slightly more

Re: faster splitter

2016-05-27 Thread qznc via Digitalmars-d
On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: I think it's clear that `std find` is vry slow. If anyone wants to test my function, please do so. I don't want to spread wrong data, but this is what I get at the moment (ldc latest version). If you want to see find (current or my

Re: faster splitter

2016-05-27 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 14:06:09 UTC, Chris wrote: This outperforms both `manual_find` and the improved `std find` string findStringS_Manual(string haystack, string needle) { if (needle.length > haystack.length) return haystack[$..$]; outer:

Re: faster splitter

2016-05-27 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 13:29:00 UTC, Andrei Alexandrescu wrote: On 5/27/16 8:28 AM, Chris wrote: This is surprising. It should be easy to construct examples where brute force search performs arbitrarily poor, e.g. searching for "aaa...aab" in a long string with only "a"s. I will look

Re: faster splitter

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 8:28 AM, Chris wrote: I've played around with some algorithms and kept them as simple as possible, but I've found that a linear brute force for-loop always beats them (it's the extra decision(s), I suppose). It looks nice in theory, but in hardware reality a stupid loop is more

Re: faster splitter

2016-05-27 Thread Chris via Digitalmars-d
On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote: On 05/23/2016 03:11 PM, qznc wrote: Conventional wisdom has it that find() is brute force and that's that, but probably it's time to destroy. Selectively using advanced searching algorithms for the appropriate inputs is very

Re: faster splitter

2016-05-25 Thread Chris via Digitalmars-d
On Wednesday, 25 May 2016 at 11:30:33 UTC, qznc wrote: On Wednesday, 25 May 2016 at 10:45:35 UTC, qznc wrote: On Wednesday, 25 May 2016 at 10:28:47 UTC, Chris wrote: Any suggestions how we can make find more efficient? I will send a pull request soon. My current patch [0] improves it, but

Re: faster splitter

2016-05-25 Thread qznc via Digitalmars-d
On Wednesday, 25 May 2016 at 10:45:35 UTC, qznc wrote: On Wednesday, 25 May 2016 at 10:28:47 UTC, Chris wrote: Any suggestions how we can make find more efficient? I will send a pull request soon. My current patch [0] improves it, but still contains a bug. [0]

Re: faster splitter

2016-05-25 Thread qznc via Digitalmars-d
On Wednesday, 25 May 2016 at 10:28:47 UTC, Chris wrote: Any suggestions how we can make find more efficient? I will send a pull request soon. My current patch [0] improves it, but still contains a bug. [0]

Re: faster splitter

2016-05-25 Thread Chris via Digitalmars-d
On Tuesday, 24 May 2016 at 22:01:17 UTC, qznc wrote: On Tuesday, 24 May 2016 at 19:47:52 UTC, qznc wrote: I discovered radare2. While it is actually a debugger (like gdb), it includes a nice disassembler. Example view, which shows the loop calling std.algorithm.find repeatedly: |

Re: faster splitter

2016-05-24 Thread qznc via Digitalmars-d
On Tuesday, 24 May 2016 at 19:47:52 UTC, qznc wrote: PS: Any recommendations for a Linux dissassembler? Objdump is ok, but some arrows for jumps would be nice. I discovered radare2. While it is actually a debugger (like gdb), it includes a nice disassembler. Example view, which shows the

Re: faster splitter

2016-05-24 Thread Andrei Alexandrescu via Digitalmars-d
On 05/24/2016 03:47 PM, qznc wrote: PS: Any recommendations for a Linux dissassembler? Objdump is ok, but some arrows for jumps would be nice. Try http://asm.dlang.org for short tests. -- Andrei

Re: faster splitter

2016-05-24 Thread qznc via Digitalmars-d
On Tuesday, 24 May 2016 at 10:44:12 UTC, qznc wrote: Unfortunately, it is slower. My current measurements [0]: $ ./benchmark 1000 10 # large haystack, few iterations std findtook753837231 manual find took129561980 $ ./benchmark 10 1000 # small haystack, many iterations std

Re: faster splitter

2016-05-24 Thread Andrei Alexandrescu via Digitalmars-d
On 05/24/2016 08:38 AM, Andrei Alexandrescu wrote: One somewhat unrelated thing that can be done with Boyer-Moore: use a constant-length skip array instead of dynamic allocation. Upon saturation, continue with brute force. -- Andrei Another idea: in every searched string there is one specific

Re: faster splitter

2016-05-24 Thread Andrei Alexandrescu via Digitalmars-d
On 05/24/2016 03:54 AM, qznc wrote: On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote: On 05/23/2016 03:11 PM, qznc wrote: Actually, std find should be faster, since it could use the Boyer Moore algorithm instead of naive string matching. Conventional wisdom has it that

Re: faster splitter

2016-05-24 Thread Andrei Alexandrescu via Digitalmars-d
On 05/24/2016 06:44 AM, qznc wrote: On Tuesday, 24 May 2016 at 09:34:30 UTC, Chris wrote: On Tuesday, 24 May 2016 at 07:54:38 UTC, qznc wrote: Apart from advanced algorithms, find should not be slower than a naive nested-for-loop implementation. [0]

Re: faster splitter

2016-05-24 Thread qznc via Digitalmars-d
On Tuesday, 24 May 2016 at 09:34:30 UTC, Chris wrote: On Tuesday, 24 May 2016 at 07:54:38 UTC, qznc wrote: Apart from advanced algorithms, find should not be slower than a naive nested-for-loop implementation. [0] https://issues.dlang.org/show_bug.cgi?id=16066 From Phobos [1]: /*

Re: faster splitter

2016-05-24 Thread Chris via Digitalmars-d
On Tuesday, 24 May 2016 at 07:54:38 UTC, qznc wrote: On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote: On 05/23/2016 03:11 PM, qznc wrote: Actually, std find should be faster, since it could use the Boyer Moore algorithm instead of naive string matching. Conventional wisdom

Re: faster splitter

2016-05-24 Thread Chris via Digitalmars-d
On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote: On 05/23/2016 03:11 PM, qznc wrote: Actually, std find should be faster, since it could use the Boyer Moore algorithm instead of naive string matching. Conventional wisdom has it that find() is brute force and that's that,

Re: faster splitter

2016-05-24 Thread qznc via Digitalmars-d
On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote: On 05/23/2016 03:11 PM, qznc wrote: Actually, std find should be faster, since it could use the Boyer Moore algorithm instead of naive string matching. Conventional wisdom has it that find() is brute force and that's that,

Re: faster splitter

2016-05-23 Thread Andrei Alexandrescu via Digitalmars-d
On 05/23/2016 03:11 PM, qznc wrote: Actually, std find should be faster, since it could use the Boyer Moore algorithm instead of naive string matching. Conventional wisdom has it that find() is brute force and that's that, but probably it's time to destroy. Selectively using advanced

Re: faster splitter

2016-05-23 Thread Andrei Alexandrescu via Digitalmars-d
On 05/23/2016 03:11 PM, Jack Stouffer wrote: (I think it's a micro optimization at best) splitter in the stdlib is likely to be very frequently useful, any bit of speedup we put in it is likely to pay off immensely. -- Andrei

Re: faster splitter

2016-05-23 Thread qznc via Digitalmars-d
On Sunday, 22 May 2016 at 18:56:30 UTC, qznc wrote: On Monday, 4 March 2013 at 19:11:17 UTC, Steven Schveighoffer wrote: On Mon, 04 Mar 2013 12:35:23 -0500, Andrei Alexandrescu wrote: That's comparable, please file an enh request.

Re: faster splitter

2016-05-23 Thread Jack Stouffer via Digitalmars-d
On Monday, 23 May 2016 at 14:47:22 UTC, qznc wrote: I see three options: 1. Remove dead bookkeeping code 2. Implement back() and popBack() 3. Use alternative splitter implementation (and implement back() and popBack()) The third one would be the best, if it is really faster. If the

Re: faster splitter

2016-05-23 Thread Andrei Alexandrescu via Digitalmars-d
On 05/23/2016 10:25 AM, Jack Stouffer wrote: On Monday, 23 May 2016 at 14:17:59 UTC, Andrei Alexandrescu wrote: What would be the condition tested in the static if? I would assume `static if (isBidirectionalRange!Range)` Recall that sometimes you do want a forward range even when you could

Re: faster splitter

2016-05-23 Thread Andrei Alexandrescu via Digitalmars-d
On 05/23/2016 10:47 AM, qznc wrote: It would be strictly better to implement the back and popBack, no? If it adds no overhead, you're right. -- Andrei

Re: faster splitter

2016-05-23 Thread qznc via Digitalmars-d
On Monday, 23 May 2016 at 14:25:52 UTC, Jack Stouffer wrote: On Monday, 23 May 2016 at 14:17:59 UTC, Andrei Alexandrescu wrote: What would be the condition tested in the static if? I would assume `static if (isBidirectionalRange!Range)` Recall that sometimes you do want a forward range even

Re: faster splitter

2016-05-23 Thread Jack Stouffer via Digitalmars-d
On Monday, 23 May 2016 at 14:17:59 UTC, Andrei Alexandrescu wrote: What would be the condition tested in the static if? I would assume `static if (isBidirectionalRange!Range)` Recall that sometimes you do want a forward range even when you could define a bidirectional range. -- Andrei I

Re: faster splitter

2016-05-23 Thread Andrei Alexandrescu via Digitalmars-d
On 5/23/16 9:44 AM, Jack Stouffer wrote: On Monday, 23 May 2016 at 13:20:55 UTC, Andrei Alexandrescu wrote: Most uses are forward, so perhaps it's best to eliminate the bidirectional bookkeeping if it adds overhead, and confine those to a separate function e.g. splitterBidirectional. There is

Re: faster splitter

2016-05-23 Thread Jack Stouffer via Digitalmars-d
On Monday, 23 May 2016 at 13:20:55 UTC, Andrei Alexandrescu wrote: Most uses are forward, so perhaps it's best to eliminate the bidirectional bookkeeping if it adds overhead, and confine those to a separate function e.g. splitterBidirectional. There is precedent for that in Phobos. -- Andrei

Re: faster splitter

2016-05-23 Thread Andrei Alexandrescu via Digitalmars-d
On 05/23/2016 08:17 AM, qznc wrote: On Monday, 23 May 2016 at 12:01:52 UTC, qznc wrote: Additionally, there is this weird special case for a bidirectional range, which just adds unnecessary overhead. Is "remove dead code" a good enough reason in itself for a PR? Forget the "dead code comment"

Re: faster splitter

2016-05-23 Thread qznc via Digitalmars-d
On Monday, 23 May 2016 at 12:01:52 UTC, qznc wrote: Additionally, there is this weird special case for a bidirectional range, which just adds unnecessary overhead. Is "remove dead code" a good enough reason in itself for a PR? Forget the "dead code comment" it is a actually a missing

Re: faster splitter

2016-05-23 Thread Seb via Digitalmars-d
On Monday, 23 May 2016 at 12:01:52 UTC, qznc wrote: On Monday, 23 May 2016 at 11:52:35 UTC, Seb wrote: On Sunday, 22 May 2016 at 18:56:30 UTC, qznc wrote: On Monday, 4 March 2013 at 19:11:17 UTC, Steven Schveighoffer wrote: [...] Below is an implementation, which matches MySplitter with

Re: faster splitter

2016-05-23 Thread qznc via Digitalmars-d
On Monday, 23 May 2016 at 11:52:35 UTC, Seb wrote: On Sunday, 22 May 2016 at 18:56:30 UTC, qznc wrote: On Monday, 4 March 2013 at 19:11:17 UTC, Steven Schveighoffer wrote: [...] Below is an implementation, which matches MySplitter with dmd, but not with ldc. More precisely: [...] have

Re: faster splitter

2016-05-23 Thread Seb via Digitalmars-d
On Sunday, 22 May 2016 at 18:56:30 UTC, qznc wrote: On Monday, 4 March 2013 at 19:11:17 UTC, Steven Schveighoffer wrote: [...] Below is an implementation, which matches MySplitter with dmd, but not with ldc. More precisely: [...] have you thought about opening a PR to improve

faster splitter

2016-05-22 Thread qznc via Digitalmars-d
On Monday, 4 March 2013 at 19:11:17 UTC, Steven Schveighoffer wrote: On Mon, 04 Mar 2013 12:35:23 -0500, Andrei Alexandrescu wrote: That's comparable, please file an enh request. http://d.puremagic.com/issues/show_bug.cgi?id=9646 Below is an