Re: A path commonPrefix function
@cblake May I draw your attention to [https://forum.nim-lang.org/t/5734#35803](https://forum.nim-lang.org/t/5734#35803) ? The differences between `-d:danger` and `-d:release` a mysterious to us.
Re: A path commonPrefix function
Things in that last version weren't robust/efficient with regards to zero length strings being in the mix. This fixes that: proc lcpLenVRange(strs: openArray[string]): int = if strs.len == 0: return -1 # raise? if strs[0].len == 0: return 0 # ANY "" anywhere => 0 let byte0 = strs[0][0] var minAt, maxAt: int for i in 1 ..< strs.len: if strs[i].len == 0: return 0 if strs[i] < strs[minAt]: minAt = i if strs[i] > strs[maxAt]: maxAt = i if strs[i][0] != byte0: return 0 for i, c in strs[minAt]: if c != strs[maxAt][i]: return i return strs[minAt].len Run In the interests of being mostly self-contained, I also pushed a better `timeIt` to `cligen#head` and updated my driver to use the new `timeIt` and to **not** time the `stdout.write` parts: proc timeAlgo(algo=lcpVRange, reps=1, strs: seq[string]) = var x: int timeIt(stdout.write, "", 1e-6, 3, " usec via " & ($algo)[3..^1], reps): x += strs.lcpLen(algo) echo " ", int(x.float / reps.float + 0.5) Run Finally, I created a `benchLcp.zsh` script: #!/bin/zsh lcp=`pwd`/lcp #We need Zsh only for its ** echo "L3 data, 19 answer" for a in lcpVe lcpR lcpB lcpVR; do $lcp -r10 -a$a /usr/lib/python3.6/**; done echo "L2 data, 9 answer" for a in lcpVe lcpR lcpB lcpVR; do $lcp -r10 -a$a /usr/bin/*; done echo "L3 data, 0 answer" for a in lcpVe lcpR lcpB lcpVR; do (cd /usr/lib/python3.6; $lcp -r10 -a$a **); done echo "L2 data, 0 answer" for a in lcpVe lcpR lcpB lcpVR; do (cd /usr/bin; $lcp -r10 -a$a *); done Run and used a little `nim-pgo` wrapper I have to use gcc's profile guided optimization (`-fprofile-generate` and `-fprofile-use`), compiling, running a test/bench, then re-compiling. With all that, I get: L3 data, 19 answer 1910.448..2253.771 1959.491 +- 32.812 usec via Vertical 19 686.884..1150.131 777.125 +- 48.653 usec via Range 19 922.680..1522.064 996.113 +- 58.569 usec via BinSearch 19 693.560..1163.244 786.591 +- 49.932 usec via VRange 19 L2 data, 9 answer 41.246..63.181 45.848 +- 2.480 usec via Vertical 9 37.193..54.359 40.936 +- 1.778 usec via Range 9 22.650..25.511 23.413 +- 0.353 usec via BinSearch 9 36.716..55.075 41.509 +- 2.292 usec via VRange 9 L3 data, 0 answer 0.000..0.954 0.167 +- 0.094 usec via Vertical 0 535.488..1017.570 627.780 +- 48.740 usec via Range 0 58.651..205.994 82.874 +- 15.300 usec via BinSearch 0 0.715..3.576 1.097 +- 0.278 usec via VRange 0 L2 data, 0 answer 0.000..0.000 0.000 +- 0.000 usec via Vertical 0 32.663..38.862 33.927 +- 0.650 usec via Range 0 3.099..4.768 3.290 +- 0.166 usec via BinSearch 0 0.000..0.238 0.048 +- 0.032 usec via VRange 0 Run which makes total sense to me. The @marks performance discrepancies (except his mysterious `foldl` bit), probably just come from the "mix"/averaging of how many zero length answers are in his inputs. So, this new variant of that VRange hybrid method would make the most sense to recommend for general usage (of the above selection, anyway). It's never that slow, fastest at large scale when performance matters most, and can leverage early exit in easy cases. The binary search way for smaller scale data may be best if the caller already knows and can pass in a data scale parameter. If we must measure the data size, that takes a whole pass through the length fields. That measurement eats into the 36/22=1.6x L2-non-zero speed ratio, probably reducing it to like 1.3x which isn't that much a boost for all the complexity of measurement+algorithm switching. I got about a 1.15x boost from PGO and most probably skip that entirely to avoid build complexity.
Re: A path commonPrefix function
Nowaways, it is very easy for us to buy cheap shoes either online or offline. However, it is essential to locate the best supplier to get best quality shoes for cheaper price. If anyone wants to buy cheap high trendy shoes, then our platforms are the best option. We provide the best quality Shoes with fast and free shipping.You can get them at the most affordable prices. Come to our online store and order now! Cheap Golden Goose Shoes [https://www.vogueluna.com](https://www.vogueluna.com) Cheap Balenciaga Triple S [https://www.sneaker2all.com](https://www.sneaker2all.com) Cheap Christian Louboutin Shoes [https://www.voguequeens.com](https://www.voguequeens.com) Cheap Air Force 1 Shoes [https://www.hotsellstyle.com/nike-air-force-1.html](https://www.hotsellstyle.com/nike-air-force-1.html) Cheap Yeezy 350v2 Shoes [https://www.sneaker4shoes.com/adidas-yeezy-shoes.html](https://www.sneaker4shoes.com/adidas-yeezy-shoes.html)
Re: A path commonPrefix function
For explicitness, proc lcpLenVRange(strs: openArray[string]): int = if strs.len == 0: return -1 # raise? var minAt, maxAt: int for i in 1 ..< strs.len: if strs[i] < strs[minAt]: minAt = i if strs[i] > strs[maxAt]: maxAt = i if strs[i].len > 0 and strs[i][0] != strs[0][0]: return 0 for i, c in strs[minAt]: if c != strs[maxAt][i]: return i return strs[minAt].len Run Integrating that into the test harness appropriately and running on my two running data input examples (extended to a case with zero length common prefix), and switching to `mnmx` (which gives the range of times instead of mean+-sdev) then gives: $ for a in lcpVe lcpR lcpB lcpVR; { (repeat 20; lcp -a$a /usr/lib/python3.6/**)|mnmx } 19 in 2300.262..2448.32 microseconds via lcpVertical 19 in 1122.236..1162.767 microseconds via lcpRange 19 in 1523.256..1558.781 microseconds via lcpBinSearch 19 in 1189.47..1224.756 microseconds via lcpVRange $ for a in lcpVe lcpR lcpB lcpVR; { (repeat 20; lcp -a$a /usr/bin/*)|mnmx } 9 in 51.737..76.532 microseconds via lcpVertical 9 in 48.637..68.903 microseconds via lcpRange 9 in 36.24..40.054 microseconds via lcpBinSearch 9 in 46.253..54.598 microseconds via lcpVRange $ for a in lcpVe lcpR lcpB lcpVR; { (cd /usr/bin; repeat 20; lcp -a$a *)|mnmx } 0 in 3.099..5.722 microseconds via lcpVertical 0 in 43.392..80.109 microseconds via lcpRange 0 in 8.583..13.113 microseconds via lcpBinSearch 0 in 3.099..5.245 microseconds via lcpVRange $ for a in lcpVe lcpR lcpB lcpVR; { (cd /usr/lib/python3.6; repeat 20; lcp -a$a **)|mnmx } 0 in 7.868..11.683 microseconds via lcpVertical 0 in 911.713..959.158 microseconds via lcpRange 0 in 174.522..190.735 microseconds via lcpBinSearch 0 in 9.537..14.782 microseconds via lcpVRange Run So, you can get the early exit of the vertical method for zero common prefices in the range method without very much extra cost. Eg., slightly negative/in-the-noise cost for the L2 prefixLen==9 case and 6% slower for the prefixLen=19 L3 case with the early exits basically matching times in the two prefixLen==0 cases. (and, yeah, yeah..the Nim program should `import stats`, do the repeated loops internally updating a `RunningStat`, and report. It could even take an `lcpAll` enum to report on all algorithms..and -- if one wants to commit to only file tree inputs -- maybe even take just a directory as a command parameter, `import oswalkdir`, and use `walkDir` and `walkDirRec` instead of `*` and `**` to remove dependency on shell features.)
Re: A path commonPrefix function
(One could probably inline the range work into the lcp and keep track of just the first character and exit early if it's ever different to combine the best properties...)
Re: A path commonPrefix function
Oops. Didn't survive some edits. Good catch as always, @e. Will fix in the prior post with a note. Does not effect my results which are like this: $ echo /usr/bin/*|wc 13077 58306 $ (repeat 50; lcp -alcpV /usr/bin/*)|mnsd 9 in 53.97 +- 0.30 microseconds via lcpVertical $ (repeat 50; lcp -alcpR /usr/bin/*)|mnsd 9 in 51.33 +- 0.42 microseconds via lcpRange $ (repeat 50; lcp -alcpB /usr/bin/*)|mnsd 9 in 39.29 +- 0.78 microseconds via lcpBinSearch Run (I use Zsh which has `repeat` and a little mean-sdev computer.) Then for a larger inputs: $ echo /usr/lib/python3.6/**|wc 1 39036 3108640 $ (repeat 10 lcp -alcpV /usr/lib/python3.6/**)|mnsd 19 in 2388. +- 14. microseconds via lcpVertical $ (repeat 10 lcp -alcpR /usr/lib/python3.6/**)|mnsd 19 in 1131.1 +- 1.5 microseconds via lcpRange $ (repeat 10 lcp -alcpB /usr/lib/python3.6/**)|mnsd 19 in 1523. +- 17. microseconds via lcpBinSearch Run The L3 on an i7-6700k is 8MB. So, this is showing the range method pulling ahead while still in L3 (3.1 MB). At one pass, it really has to be approximately the fastest at large scale. Reading from DIMMs the difference should be even more pronounced in favor of range. It's possible that that marks/vertical scan style might be faster if the answer is very small. E.g., for a zero common prefix length, the answer will be concluded after just one pass down the first column via `lcpVertical`.
Re: A path commonPrefix function
In `rangeAt` you have if xs[result.minAt] < xs[i]: result.minAt = i if xs[result.minAt] > xs[i]: result.minAt = i Run whereas I suspect you meant to use `maxAt` in the second line. Does this affect your timing results?
Re: A path commonPrefix function
It's a bit tricky to explain why your `foldl` would be much faster than my `minLen` (note "min" not "max"). Perhaps you aren't compiling with max optimization/`-d:danger` or something? Or perhaps some peculiarity related to your test data. Total data size, number of strings, and size of the found prefices are more salient statistics than how many times you repeat, though another possible explanation is that 100k repeats in a tight loop does something weird with a warmed up CPU branch predictor. My timings come from a Unix shell loop doing 10..100 repeated runs of the whole program. As with much benchmarking, it's debatable what is most representative. I'll include the whole program below for reference. I called @marks' algorithm `lcpVertical` for vertically-oriented scanning and swapped `i` & `j` to have more traditional `j=column index` notation and slightly earlier exit. Because cligen allows unique prefixes `./lcp -alcpb /usr/bin/\*` is a valid way to run things in binary search mode. I don't do the final `rfind(sep)` work in any of those, but it's the same work for all the algorithms anyway and just a couple line wrapper `proc`. Anyway, I've run with a variety of inputs and things track my stated expectations perfectly (compiled with devel version of nim, d:danger, gcc-9.2, on Linux, i6700k). For really large inputs the range algo works best..for really small the binary search. So, an optimal at all scales would probably switch from binary search to range at some machine/system-load dependent threshold. stdlib-wise, the `range` LCP algo is never worse than about 2X the run time on my various /usr/xyz/* type tests, though. So, from a non-tuning/performance robustness point of view, it would be a better candidate for stdlib inclusion. Beyond the "one stop shopping" algo-wise and almost trivial implementation, its `rangeAt` & `range` helpers are much more basic/common operations than longest common prefix. Beyond that basicness, as mentioned the value-based `range` may afford strong-speed-ups for specialized types. So, it might be helpful to have a single point of reference that could be optimized/assumed optimized the way certain standard C library functions are, such as `memchr`. when defined(foldl): import sequtils # For me this is slower proc minLen(strs: openArray[string]): int {.inline.} = foldl(strs.mapIt(len(it)), min(a, b)) else: proc minLen(strs: openArray[string]): int {.inline.} = result = int.high for s in strs: result = min(result, s.len) proc check(strs: openArray[string]; j0, j1: int): bool = let first = strs[0] for s in strs: for j in j0 .. j1: if s[j] != first[j]: return false return true proc lcpLenBinSearch(strs: openArray[string]): int = if strs.len == 0: return 0 # maybe -1 instead? if strs.len == 1: return strs[0].len var lo = 0 # Binary search var hi = strs.minLen - 1 while lo <= hi: let mid = lo + (hi - lo) div 2 if check(strs, lo, mid): # All strs match this result += mid + 1 - lo # Append to answer lo = mid + 1# Go higher else: # Go lower hi = mid - 1 proc rangeAt*[T](xs: openArray[T]): tuple[minAt, maxAt: int] = if xs.len == 0: return (-1, -1) # raise? for i in 1 ..< xs.len: if xs[result.minAt] < xs[i]: result.minAt = i if xs[result.minAt] > xs[i]: result.minAt = i proc range*[T](xs: openArray[T]): tuple[min, max: T] = if xs.len == 0: return # raise? let (minAt, maxAt) = xs.rangeAt result.min = xs[minAt] result.max = xs[maxAt] proc lcpLenRange(strs: openArray[string]): int = if strs.len == 0: return -1 # raise? let (minAt, maxAt) = strs.rangeAt for i, c in strs[minAt]: if c != strs[maxAt][i]: return i return strs[minAt].len proc lcpLenVertical(strs: openArray[string]): int = if strs.len == 0: return 0 # simplified marks algo let first = strs[0] for j in 0 ..< first.len: for i in 1 ..< strs.len: if j >= strs[i].len or first[j] != strs[i][j]: return j return first.len type lcpAlgo* = enum lcpBinSearch, lcpRange, lcpVertical proc lcpLen*(strs: openArray[string], algo=lcpVertical): int = case algo of lcpBinSearch: return strs.lcpLenBinSearch of lcpRange: return strs.lcpLenRange of lcpVertical: return strs.lcpLenVertical proc lcp*(strs: openArray[string], algo=lcpBinSearch): string = if strs.len < 1: return "" result = strs[0][0 ..< strs.lcpLen(algo)] when isMainModule: # asserts are the Rosetta Code tests assert lcp(["interspecies", "interstellar", "interstate"]) == "inters" assert lcp(["thr
Re: A path commonPrefix function
Using your `for j in 1 .. paths.high()` is _much_ faster than my `for s in paths[1 .. ^1]`, but for me my `foldl` is much faster than your `maxLen`. (My test data isn't great though, a dozen calls, eleven with common prefixes, repeated 100K times.) Maybe a common prefix algorithm should be in the std. lib., perhaps with one version that is "raw" and another which does prefixes only at a given separator?
Re: A path commonPrefix function
And before someone corrects me, an even less embarrassing variant might be: proc rangeAt*[T](xs: openArray[T]): tuple[minAt, maxAt: int] = if xs.len < 1: return (-1, -1) # raise? for i in 1 ..< xs.len: # note tuple ints init to zero if xs[result.minAt] < xs[i]: result.minAt = i if xs[result.minAt] > xs[i]: result.minAt = i Run with then `let (minAt, maxAt) = strs.rangeAt` and tracking `(min|max)` -> `strs[(min|max)At]` changes in the string comparison for the prefix length algo. Some paired-up value oriented interface for those who want it would also make sense: proc range*[T](xs: openArray[T]): tuple[min, max: T] {.inline.} = if xs.len < 1: return # raise? let (minAt, maxAt) = xs.rangeAt result.min = xs[minAt] result.max = xs[maxAt] Run in which case maybe no change to the prefix len algo is needed. While off the topic of the prefix calculation, it perhaps bears noting that SSE/AVX calculations can sometimes speed up the value-oriented versions of these range computations over arrays of CPU supported types like `openArray[float32]` by large factors - 24x even. Basically branchless and batching vector min/max instructions can be leveraged, but I am unaware of CPUs supporting "minAt" type vector instructions. Recent versions of gcc can recognize these min/max type value sweeps and correctly generate optimized code, but such pattern recognition can often fail as the code grows just a little more complex.
Re: A path commonPrefix function
> for s in paths[1 .. ^1]: The above line really slows things down. Pretty sure it's creating a copy of that slice every outer loop. Change it to `for j in 1 ..< paths.len:` and use `s[j][i]` below. I still timed your version as about 1.5x the run time of my version. That residual was attributable to your `foldl` vs my `minLen`. It might be simpler to just use my version completely when adjudicating it. :-) Also, if you can, you should measure/say something about your expected total number of strings, average length (total data), and typical answer length. A less embarasingly slow version of `range` for "expensive `T`" than my prior post is: proc range*[T](xs: openArray[T]): tuple[min, max: T] = if xs.len < 1: return # default vals for T var iMin, iMax: int # zero init for i in 1 ..< xs.len: if xs[iMin] < xs[i]: iMin = i if xs[iMax] > xs[i]: iMax = i result.min = xs[iMin] result.max = xs[iMax] Run With this new version, the one-pass version is only about 1.5x the run-time of the binary search version on that easily L2 cachable 3078 string avg len 17 bytes with prefix 9 running example of mine. Of course, as alluded to, the ratio of main DIMM bandwidth to L[12] cache bandwidth is almost always much greater than a small factor like 1.5..More like 4-5x. So, at some point the one pass range/string compare method will be fastest, but that point may not matter for your use case.
Re: A path commonPrefix function
I tried the binary search approach but it was still much slower than my best version on my test data. For reference, here's what I tried: proc commonPrefix*(paths: openArray[string], sep = "/"): string = if len(paths) < 1: return "" let first = paths[0] if len(first) < 2: return first var lo = 0 var hi = foldl(paths.mapIt(len(it)), min(a, b)) - 1 # min len while lo <= hi: let mid = lo + (hi - lo) div 2 var ok = true block loop: for s in paths[1 .. ^1]: for i in lo .. mid: if s[i] != first[i]: ok = false break loop if ok: result.add(first[lo .. mid]) lo = mid + 1 else: hi = mid - 1 Run
Re: A path commonPrefix function
`result += mid + 1 - lo` is, indeed, way cheaper than `result.add`, but that binary search approach only does the `result.add` about `ceil(log_2(minLen))` times (i.e., like 3-10 times probably). I couldn't even measure the difference on my 3078 entry average total length 17 `/usr/bin/\*` example (i.e. noise was >> difference). Regardless, I **100% agree** that separating into a prefixLen and then getting that slice would be a nicer factoring anyway. The caller may only need to test the length against `0` or something. Another algorithm of interest in terms of this whole thread is the "just compare first & last in sorted order" approach. This very "tidy" from both a conceptual and coding point of view: proc range*[T](xs: openArray[T]): tuple[min, max: T] = if xs.len < 1: return # default vals for T result.min = xs[0] result.max = xs[0] for x in xs: result.min = min(result.min, x) result.max = max(result.max, x) proc longestCommonPrefixLen*(strs: openArray[string]): int = let (min, max) = strs.range for i, c in min: if c != max[i]: return i return min.len Run As written, that's about 6x slower than the binary search approach for L2 resident data. It's probably possible to really speed up `range` for `openArray[string]` with some shallow copies in the loop and a real copy at the end. That might be a fun exercise for someone. Indeed, for very large inputs, bandwidth starvation of modern CPUs will mean that doing only one pass over said inputs is going to be best, even if it does quite a bit more CPU ALU-type work. So, even if that `range` cannot be sped up to match/beat the binary search in this context, one would still want to switch to this (or some other) one pass algo to accommodate giant inputs in a more general setting..probably past the L2 or L3 boundary (assuming it can run with minimal cache competition). @marks did not really discuss the size of his inputs or his use case, though his second "faster on his test set" is notably a multi-pass algorithm.
Re: A path commonPrefix function
I would return an `int` instead of a string, much cheaper.
Re: A path commonPrefix function
Eg., in Nim it might look like this: proc minLen(strs: openArray[string]): int = result = int.high for s in strs: result = min(result, s.len) proc check(strs: openArray[string]; j0, j1: int): bool = let srcSlc = strs[0][j0..j1] for s in strs: if s[j0..j1] != srcSlc: return false return true proc longestCommonPfx*(strs: openArray[string]): string= if strs.len < 1: return "" let src = strs[0] if strs.len < 2: return src var lo = 0 # Binary search var hi = strs.minLen - 1 while lo <= hi: let mid = lo + (hi - lo) div 2 if check(strs, lo, mid): # All strs match this result.add src[lo .. mid] # Append to answer lo = mid + 1# Go higher else: # Go lower hi = mid - 1 when isMainModule: import os echo longestCommonPfx(commandLineParams()) Run There are also a lot of variants at [http://rosettacode.org/wiki/Longest_common_prefix](http://rosettacode.org/wiki/Longest_common_prefix) For paths you would just need to add something to backup to the most recent directory separator, if any.
Re: A path commonPrefix function
Are you aware of this binary search approach? [https://www.tutorialcup.com/string/longest-common-prefix-using-biary-search.htm](https://www.tutorialcup.com/string/longest-common-prefix-using-biary-search.htm)
Re: A path commonPrefix function
This is the fastest and best so far: proc commonPrefix*(paths: openArray[string], sep = "/"): string = if len(paths) == 0: return "" let first = paths[0] var index = -1 block loop: for i in 0 ..< len(first): for j in 1 .. paths.high(): let path = paths[j] if i < len(path) and first[i] != path[i]: break loop index = i if index == -1: return "" else: return first[0 .. first.rfind(sep, 0, index)] Run
Re: A path commonPrefix function
Thanks for that. I discovered that my new version is 4x faster than the original..
Re: A path commonPrefix function
> Is there a nim equivalent to Python's timeit module for timing snippets? There are probably about 100 variants in people's local code, but I happened to just see your message and also to just add this to `cligen/osUt` the other day (needs `requires "cligen#head"` in your `.nimble` file or you could just copy this tiny impl and adapt as you like): import strutils, times template timeIt*(label:string, unit:float, places=3, sep="\n", body: untyped) = let t0 = epochTime() body let dt = epochTime() - t0 stdout.write label, " ", formatFloat(dt * unit, ffDecimal, places), sep timeIt("write1", 1e6, 3, " ") : stderr.write "hi" timeIt("write2", 1e6, 3, "\n"): stderr.write "there" Run You may want to add loops to repeat and use the `stats.RunningStat` stdlib API to do something nice more like Python's version. There is also this package called golden ([https://github.com/disruptek/golden](https://github.com/disruptek/golden)) to try to iterate until timing noise has been suppressed. And probably other various solutions, too. (And sometimes noise **is** the data and you want cold-cache/system competition effects included...benchmarking is obviously a larger topic than a snippet thing.)
Re: A path commonPrefix function
This version walks forward char by char. Is there a nim equivalent to Python's `timeit` module for timing snippets? proc commonPrefix*(paths: openArray[string], sep = "/"): string = if len(paths) == 0: return "" result = paths[0] var index = 0 var ok = false for path in paths[1 .. ^1]: while index < len(path) and index < len(result) and result[index] == path[index]: inc index ok = true result = if not ok: "" else: result[0 .. result.rfind(sep, 0, index)] Run
Re: A path commonPrefix function
When your commonPrefix takes 2 paths, length of both paths are n and both paths has no common prefix, it has O(n^2) time complexity. Because startsWith proc is O(n) and it is called n times. relativePath proc in os module calculate common prefix in O(n) time complexity. [https://github.com/nim-lang/Nim/blob/devel/lib/pure/os.nim](https://github.com/nim-lang/Nim/blob/devel/lib/pure/os.nim)
Re: A path commonPrefix function
something like that proc commonPrefix*(paths: openArray[string], sep = "/"): string = if len(paths) > 0: result = paths[0] for i in 1 .. paths.high: let path = paths[i] while not path.startsWith(result) and len(result) > 0: result.setLen(result.high) result.setLen(result.rfind(sep) + 1) Run
A path commonPrefix function
I have the following `commonPrefix()` function (for whole paths) which works fine on Linux (and Windows), but wondering if it could be shorter and/or more efficient? proc commonPrefix*(paths: openArray[string], sep = "/"): string = if len(paths) == 0: return "" result = paths[0] for path in paths[1 .. ^1]: while not path.startsWith(result) and len(result) > 0: result = result[0 .. ^2] result = result[0 .. result.rfind(sep)] Run Tests: test "commonPrefix": check(commonPrefix(["/tmp/test1", "/tmp/data.dat"]) == "/tmp/") check(commonPrefix(["/home/mark/test1", "/tmp/data.dat"]) == "/") check(commonPrefix(["/home/mark/test1", "local/data.dat"]) == "") Run