Re: A path commonPrefix function

2020-01-07 Thread Araq
@cblake May I draw your attention to 
[https://forum.nim-lang.org/t/5734#35803](https://forum.nim-lang.org/t/5734#35803)
 ? The differences between `-d:danger` and `-d:release` a mysterious to us.


Re: A path commonPrefix function

2020-01-07 Thread cblake
Things in that last version weren't robust/efficient with regards to zero 
length strings being in the mix. This fixes that: 


proc lcpLenVRange(strs: openArray[string]): int =
  if strs.len == 0: return -1 # raise?
  if strs[0].len == 0: return 0   # ANY "" anywhere => 0
  let byte0 = strs[0][0]
  var minAt, maxAt: int
  for i in 1 ..< strs.len:
if strs[i].len == 0: return 0
if strs[i] < strs[minAt]: minAt = i
if strs[i] > strs[maxAt]: maxAt = i
if strs[i][0] != byte0: return 0
  for i, c in strs[minAt]:
if c != strs[maxAt][i]: return i
  return strs[minAt].len


Run

In the interests of being mostly self-contained, I also pushed a better 
`timeIt` to `cligen#head` and updated my driver to use the new `timeIt` and to 
**not** time the `stdout.write` parts: 


proc timeAlgo(algo=lcpVRange, reps=1, strs: seq[string]) =
  var x: int
  timeIt(stdout.write, "", 1e-6, 3, " usec via " & ($algo)[3..^1], reps):
x += strs.lcpLen(algo)
  echo " ", int(x.float / reps.float + 0.5)


Run

Finally, I created a `benchLcp.zsh` script: 


#!/bin/zsh
lcp=`pwd`/lcp   #We need Zsh only for its **
echo "L3 data, 19 answer"
for a in lcpVe lcpR lcpB lcpVR; do $lcp -r10 -a$a /usr/lib/python3.6/**; 
done
echo "L2 data, 9 answer"
for a in lcpVe lcpR lcpB lcpVR; do $lcp -r10 -a$a /usr/bin/*; done
echo "L3 data, 0 answer"
for a in lcpVe lcpR lcpB lcpVR; do (cd /usr/lib/python3.6; $lcp -r10 -a$a 
**); done
echo "L2 data, 0 answer"
for a in lcpVe lcpR lcpB lcpVR; do (cd /usr/bin; $lcp -r10 -a$a *); done


Run

and used a little `nim-pgo` wrapper I have to use gcc's profile guided 
optimization (`-fprofile-generate` and `-fprofile-use`), compiling, running a 
test/bench, then re-compiling. With all that, I get: 


L3 data, 19 answer
 1910.448..2253.771 1959.491 +- 32.812 usec via Vertical 19
 686.884..1150.131 777.125 +- 48.653 usec via Range 19
 922.680..1522.064 996.113 +- 58.569 usec via BinSearch 19
 693.560..1163.244 786.591 +- 49.932 usec via VRange 19
L2 data, 9 answer
 41.246..63.181 45.848 +- 2.480 usec via Vertical 9
 37.193..54.359 40.936 +- 1.778 usec via Range 9
 22.650..25.511 23.413 +- 0.353 usec via BinSearch 9
 36.716..55.075 41.509 +- 2.292 usec via VRange 9
L3 data, 0 answer
 0.000..0.954 0.167 +- 0.094 usec via Vertical 0
 535.488..1017.570 627.780 +- 48.740 usec via Range 0
 58.651..205.994 82.874 +- 15.300 usec via BinSearch 0
 0.715..3.576 1.097 +- 0.278 usec via VRange 0
L2 data, 0 answer
 0.000..0.000 0.000 +- 0.000 usec via Vertical 0
 32.663..38.862 33.927 +- 0.650 usec via Range 0
 3.099..4.768 3.290 +- 0.166 usec via BinSearch 0
 0.000..0.238 0.048 +- 0.032 usec via VRange 0


Run

which makes total sense to me. The @marks performance discrepancies (except his 
mysterious `foldl` bit), probably just come from the "mix"/averaging of how 
many zero length answers are in his inputs.

So, this new variant of that VRange hybrid method would make the most sense to 
recommend for general usage (of the above selection, anyway). It's never that 
slow, fastest at large scale when performance matters most, and can leverage 
early exit in easy cases.

The binary search way for smaller scale data may be best if the caller already 
knows and can pass in a data scale parameter. If we must measure the data size, 
that takes a whole pass through the length fields. That measurement eats into 
the 36/22=1.6x L2-non-zero speed ratio, probably reducing it to like 1.3x which 
isn't that much a boost for all the complexity of measurement+algorithm 
switching. I got about a 1.15x boost from PGO and most probably skip that 
entirely to avoid build complexity.


Re: A path commonPrefix function

2020-01-07 Thread dmfusa
Nowaways, it is very easy for us to buy cheap shoes either online or offline. 
However, it is essential to locate the best supplier to get best quality shoes 
for cheaper price. If anyone wants to buy cheap high trendy shoes, then our 
platforms are the best option.

We provide the best quality Shoes with fast and free shipping.You can get them 
at the most affordable prices. Come to our online store and order now!

Cheap Golden Goose Shoes [https://www.vogueluna.com](https://www.vogueluna.com)

Cheap Balenciaga Triple S 
[https://www.sneaker2all.com](https://www.sneaker2all.com)

Cheap Christian Louboutin Shoes 
[https://www.voguequeens.com](https://www.voguequeens.com)

Cheap Air Force 1 Shoes 
[https://www.hotsellstyle.com/nike-air-force-1.html](https://www.hotsellstyle.com/nike-air-force-1.html)

Cheap Yeezy 350v2 Shoes 
[https://www.sneaker4shoes.com/adidas-yeezy-shoes.html](https://www.sneaker4shoes.com/adidas-yeezy-shoes.html)


Re: A path commonPrefix function

2020-01-06 Thread cblake
For explicitness, 


proc lcpLenVRange(strs: openArray[string]): int =
  if strs.len == 0: return -1 # raise?
  var minAt, maxAt: int
  for i in 1 ..< strs.len:
if strs[i] < strs[minAt]: minAt = i
if strs[i] > strs[maxAt]: maxAt = i
if strs[i].len > 0 and strs[i][0] != strs[0][0]:
  return 0
  for i, c in strs[minAt]:
if c != strs[maxAt][i]: return i
  return strs[minAt].len


Run

Integrating that into the test harness appropriately and running on my two 
running data input examples (extended to a case with zero length common 
prefix), and switching to `mnmx` (which gives the range of times instead of 
mean+-sdev) then gives: 


$ for a in lcpVe lcpR lcpB lcpVR; { (repeat 20; lcp -a$a 
/usr/lib/python3.6/**)|mnmx }
19 in 2300.262..2448.32 microseconds via lcpVertical
19 in 1122.236..1162.767 microseconds via lcpRange
19 in 1523.256..1558.781 microseconds via lcpBinSearch
19 in 1189.47..1224.756 microseconds via lcpVRange

$ for a in lcpVe lcpR lcpB lcpVR; { (repeat 20; lcp -a$a /usr/bin/*)|mnmx }
9 in 51.737..76.532 microseconds via lcpVertical
9 in 48.637..68.903 microseconds via lcpRange
9 in 36.24..40.054 microseconds via lcpBinSearch
9 in 46.253..54.598 microseconds via lcpVRange

$ for a in lcpVe lcpR lcpB lcpVR; { (cd /usr/bin; repeat 20; lcp -a$a 
*)|mnmx }
0 in 3.099..5.722 microseconds via lcpVertical
0 in 43.392..80.109 microseconds via lcpRange
0 in 8.583..13.113 microseconds via lcpBinSearch
0 in 3.099..5.245 microseconds via lcpVRange

$ for a in lcpVe lcpR lcpB lcpVR; { (cd /usr/lib/python3.6; repeat 20; lcp 
-a$a **)|mnmx }
0 in 7.868..11.683 microseconds via lcpVertical
0 in 911.713..959.158 microseconds via lcpRange
0 in 174.522..190.735 microseconds via lcpBinSearch
0 in 9.537..14.782 microseconds via lcpVRange


Run

So, you can get the early exit of the vertical method for zero common prefices 
in the range method without very much extra cost. Eg., slightly 
negative/in-the-noise cost for the L2 prefixLen==9 case and 6% slower for the 
prefixLen=19 L3 case with the early exits basically matching times in the two 
prefixLen==0 cases.

(and, yeah, yeah..the Nim program should `import stats`, do the repeated loops 
internally updating a `RunningStat`, and report. It could even take an `lcpAll` 
enum to report on all algorithms..and -- if one wants to commit to only file 
tree inputs -- maybe even take just a directory as a command parameter, `import 
oswalkdir`, and use `walkDir` and `walkDirRec` instead of `*` and `**` to 
remove dependency on shell features.)


Re: A path commonPrefix function

2020-01-06 Thread cblake
(One could probably inline the range work into the lcp and keep track of just 
the first character and exit early if it's ever different to combine the best 
properties...)


Re: A path commonPrefix function

2020-01-06 Thread cblake
Oops. Didn't survive some edits. Good catch as always, @e. Will fix in the 
prior post with a note. Does not effect my results which are like this:


$ echo /usr/bin/*|wc
  13077   58306
$ (repeat 50; lcp -alcpV /usr/bin/*)|mnsd
9 in 53.97 +- 0.30 microseconds via lcpVertical
$ (repeat 50; lcp -alcpR /usr/bin/*)|mnsd
9 in 51.33 +- 0.42 microseconds via lcpRange
$ (repeat 50; lcp -alcpB /usr/bin/*)|mnsd
9 in 39.29 +- 0.78 microseconds via lcpBinSearch


Run

(I use Zsh which has `repeat` and a little mean-sdev computer.) Then for a 
larger inputs: 


$ echo /usr/lib/python3.6/**|wc
  1   39036 3108640
$ (repeat 10 lcp -alcpV /usr/lib/python3.6/**)|mnsd
19 in 2388. +- 14. microseconds via lcpVertical
$ (repeat 10 lcp -alcpR /usr/lib/python3.6/**)|mnsd
19 in 1131.1 +- 1.5 microseconds via lcpRange
$ (repeat 10 lcp -alcpB /usr/lib/python3.6/**)|mnsd
19 in 1523. +- 17. microseconds via lcpBinSearch


Run

The L3 on an i7-6700k is 8MB. So, this is showing the range method pulling 
ahead while still in L3 (3.1 MB). At one pass, it really has to be 
approximately the fastest at large scale. Reading from DIMMs the difference 
should be even more pronounced in favor of range.

It's possible that that marks/vertical scan style might be faster if the answer 
is very small. E.g., for a zero common prefix length, the answer will be 
concluded after just one pass down the first column via `lcpVertical`.


Re: A path commonPrefix function

2020-01-06 Thread e
In `rangeAt` you have 


if xs[result.minAt] < xs[i]: result.minAt = i
if xs[result.minAt] > xs[i]: result.minAt = i


Run

whereas I suspect you meant to use `maxAt` in the second line. Does this affect 
your timing results?


Re: A path commonPrefix function

2020-01-06 Thread cblake
It's a bit tricky to explain why your `foldl` would be much faster than my 
`minLen` (note "min" not "max"). Perhaps you aren't compiling with max 
optimization/`-d:danger` or something? Or perhaps some peculiarity related to 
your test data. Total data size, number of strings, and size of the found 
prefices are more salient statistics than how many times you repeat, though 
another possible explanation is that 100k repeats in a tight loop does 
something weird with a warmed up CPU branch predictor. My timings come from a 
Unix shell loop doing 10..100 repeated runs of the whole program. As with much 
benchmarking, it's debatable what is most representative.

I'll include the whole program below for reference. I called @marks' algorithm 
`lcpVertical` for vertically-oriented scanning and swapped `i` & `j` to have 
more traditional `j=column index` notation and slightly earlier exit. Because 
cligen allows unique prefixes `./lcp -alcpb /usr/bin/\*` is a valid way to run 
things in binary search mode. I don't do the final `rfind(sep)` work in any of 
those, but it's the same work for all the algorithms anyway and just a couple 
line wrapper `proc`.

Anyway, I've run with a variety of inputs and things track my stated 
expectations perfectly (compiled with devel version of nim, d:danger, gcc-9.2, 
on Linux, i6700k). For really large inputs the range algo works best..for 
really small the binary search. So, an optimal at all scales would probably 
switch from binary search to range at some machine/system-load dependent 
threshold.

stdlib-wise, the `range` LCP algo is never worse than about 2X the run time on 
my various /usr/xyz/* type tests, though. So, from a non-tuning/performance 
robustness point of view, it would be a better candidate for stdlib inclusion. 
Beyond the "one stop shopping" algo-wise and almost trivial implementation, its 
`rangeAt` & `range` helpers are much more basic/common operations than longest 
common prefix. Beyond that basicness, as mentioned the value-based `range` may 
afford strong-speed-ups for specialized types. So, it might be helpful to have 
a single point of reference that could be optimized/assumed optimized the way 
certain standard C library functions are, such as `memchr`.


when defined(foldl):
  import sequtils # For me this is slower
  proc minLen(strs: openArray[string]): int {.inline.} =
foldl(strs.mapIt(len(it)), min(a, b))
else:
  proc minLen(strs: openArray[string]): int {.inline.} =
result = int.high
for s in strs: result = min(result, s.len)

proc check(strs: openArray[string]; j0, j1: int): bool =
  let first = strs[0]
  for s in strs:
for j in j0 .. j1:
  if s[j] != first[j]: return false
  return true

proc lcpLenBinSearch(strs: openArray[string]): int =
  if strs.len == 0: return 0  # maybe -1 instead?
  if strs.len == 1: return strs[0].len
  var lo = 0  # Binary search
  var hi = strs.minLen - 1
  while lo <= hi:
let mid = lo + (hi - lo) div 2
if check(strs, lo, mid):  # All strs match this
  result += mid + 1 - lo  # Append to answer
  lo = mid + 1# Go higher
else: # Go lower
  hi = mid - 1

proc rangeAt*[T](xs: openArray[T]): tuple[minAt, maxAt: int] =
  if xs.len == 0: return (-1, -1) # raise?
  for i in 1 ..< xs.len:
if xs[result.minAt] < xs[i]: result.minAt = i
if xs[result.minAt] > xs[i]: result.minAt = i

proc range*[T](xs: openArray[T]): tuple[min, max: T] =
  if xs.len == 0: return  # raise?
  let (minAt, maxAt) = xs.rangeAt
  result.min = xs[minAt]
  result.max = xs[maxAt]

proc lcpLenRange(strs: openArray[string]): int =
  if strs.len == 0: return -1 # raise?
  let (minAt, maxAt) = strs.rangeAt
  for i, c in strs[minAt]:
if c != strs[maxAt][i]: return i
  return strs[minAt].len

proc lcpLenVertical(strs: openArray[string]): int =
  if strs.len == 0: return 0  # simplified marks algo
  let first = strs[0]
  for j in 0 ..< first.len:
for i in 1 ..< strs.len:
  if j >= strs[i].len or first[j] != strs[i][j]:
return j
  return first.len

type lcpAlgo* = enum lcpBinSearch, lcpRange, lcpVertical

proc lcpLen*(strs: openArray[string], algo=lcpVertical): int =
  case algo
  of lcpBinSearch: return strs.lcpLenBinSearch
  of lcpRange: return strs.lcpLenRange
  of lcpVertical:  return strs.lcpLenVertical

proc lcp*(strs: openArray[string], algo=lcpBinSearch): string =
  if strs.len < 1: return ""
  result = strs[0][0 ..< strs.lcpLen(algo)]

when isMainModule:  # asserts are the Rosetta Code tests
  assert lcp(["interspecies", "interstellar", "interstate"]) == "inters"
  assert lcp(["thr

Re: A path commonPrefix function

2020-01-06 Thread marks
Using your `for j in 1 .. paths.high()` is _much_ faster than my `for s in 
paths[1 .. ^1]`, but for me my `foldl` is much faster than your `maxLen`. (My 
test data isn't great though, a dozen calls, eleven with common prefixes, 
repeated 100K times.)

Maybe a common prefix algorithm should be in the std. lib., perhaps with one 
version that is "raw" and another which does prefixes only at a given separator?


Re: A path commonPrefix function

2020-01-05 Thread cblake
And before someone corrects me, an even less embarrassing variant might be: 


proc rangeAt*[T](xs: openArray[T]): tuple[minAt, maxAt: int] =
  if xs.len < 1: return (-1, -1)  # raise?
  for i in 1 ..< xs.len:  # note tuple ints init to zero
if xs[result.minAt] < xs[i]: result.minAt = i
if xs[result.minAt] > xs[i]: result.minAt = i


Run

with then `let (minAt, maxAt) = strs.rangeAt` and tracking `(min|max)` -> 
`strs[(min|max)At]` changes in the string comparison for the prefix length 
algo. Some paired-up value oriented interface for those who want it would also 
make sense: 


proc range*[T](xs: openArray[T]): tuple[min, max: T] {.inline.} =
  if xs.len < 1: return   # raise?
  let (minAt, maxAt) = xs.rangeAt
  result.min = xs[minAt]
  result.max = xs[maxAt]


Run

in which case maybe no change to the prefix len algo is needed.

While off the topic of the prefix calculation, it perhaps bears noting that 
SSE/AVX calculations can sometimes speed up the value-oriented versions of 
these range computations over arrays of CPU supported types like 
`openArray[float32]` by large factors - 24x even. Basically branchless and 
batching vector min/max instructions can be leveraged, but I am unaware of CPUs 
supporting "minAt" type vector instructions. Recent versions of gcc can 
recognize these min/max type value sweeps and correctly generate optimized 
code, but such pattern recognition can often fail as the code grows just a 
little more complex.


Re: A path commonPrefix function

2020-01-05 Thread cblake
> for s in paths[1 .. ^1]:

The above line really slows things down. Pretty sure it's creating a copy of 
that slice every outer loop. Change it to `for j in 1 ..< paths.len:` and use 
`s[j][i]` below.

I still timed your version as about 1.5x the run time of my version. That 
residual was attributable to your `foldl` vs my `minLen`. It might be simpler 
to just use my version completely when adjudicating it. :-)

Also, if you can, you should measure/say something about your expected total 
number of strings, average length (total data), and typical answer length.

A less embarasingly slow version of `range` for "expensive `T`" than my prior 
post is: 


proc range*[T](xs: openArray[T]): tuple[min, max: T] =
  if xs.len < 1: return   # default vals for T
  var iMin, iMax: int # zero init
  for i in 1 ..< xs.len:
if xs[iMin] < xs[i]: iMin = i
if xs[iMax] > xs[i]: iMax = i
  result.min = xs[iMin]
  result.max = xs[iMax]


Run

With this new version, the one-pass version is only about 1.5x the run-time of 
the binary search version on that easily L2 cachable 3078 string avg len 17 
bytes with prefix 9 running example of mine.

Of course, as alluded to, the ratio of main DIMM bandwidth to L[12] cache 
bandwidth is almost always much greater than a small factor like 1.5..More like 
4-5x. So, at some point the one pass range/string compare method will be 
fastest, but that point may not matter for your use case.


Re: A path commonPrefix function

2020-01-05 Thread marks
I tried the binary search approach but it was still much slower than my best 
version on my test data.

For reference, here's what I tried: 


proc commonPrefix*(paths: openArray[string], sep = "/"): string =
  if len(paths) < 1:
return ""
  let first = paths[0]
  if len(first) < 2:
return first
  var lo = 0
  var hi = foldl(paths.mapIt(len(it)), min(a, b)) - 1 # min len
  while lo <= hi:
let mid = lo + (hi - lo) div 2
var ok = true
block loop:
  for s in paths[1 .. ^1]:
for i in lo .. mid:
  if s[i] != first[i]:
ok = false
break loop
if ok:
  result.add(first[lo .. mid])
  lo = mid + 1
else:
  hi = mid - 1


Run


Re: A path commonPrefix function

2020-01-04 Thread cblake
`result += mid + 1 - lo` is, indeed, way cheaper than `result.add`, but that 
binary search approach only does the `result.add` about `ceil(log_2(minLen))` 
times (i.e., like 3-10 times probably). I couldn't even measure the difference 
on my 3078 entry average total length 17 `/usr/bin/\*` example (i.e. noise was 
>> difference).

Regardless, I **100% agree** that separating into a prefixLen and then getting 
that slice would be a nicer factoring anyway. The caller may only need to test 
the length against `0` or something.

Another algorithm of interest in terms of this whole thread is the "just 
compare first & last in sorted order" approach. This very "tidy" from both a 
conceptual and coding point of view: 


proc range*[T](xs: openArray[T]): tuple[min, max: T] =
  if xs.len < 1: return   # default vals for T
  result.min = xs[0]
  result.max = xs[0]
  for x in xs:
result.min = min(result.min, x)
result.max = max(result.max, x)

proc longestCommonPrefixLen*(strs: openArray[string]): int =
  let (min, max) = strs.range
  for i, c in min:
if c != max[i]: return i
  return min.len


Run

As written, that's about 6x slower than the binary search approach for L2 
resident data. It's probably possible to really speed up `range` for 
`openArray[string]` with some shallow copies in the loop and a real copy at the 
end. That might be a fun exercise for someone.

Indeed, for very large inputs, bandwidth starvation of modern CPUs will mean 
that doing only one pass over said inputs is going to be best, even if it does 
quite a bit more CPU ALU-type work. So, even if that `range` cannot be sped up 
to match/beat the binary search in this context, one would still want to switch 
to this (or some other) one pass algo to accommodate giant inputs in a more 
general setting..probably past the L2 or L3 boundary (assuming it can run with 
minimal cache competition).

@marks did not really discuss the size of his inputs or his use case, though 
his second "faster on his test set" is notably a multi-pass algorithm.


Re: A path commonPrefix function

2020-01-04 Thread Araq
I would return an `int` instead of a string, much cheaper.


Re: A path commonPrefix function

2020-01-04 Thread cblake
Eg., in Nim it might look like this: 


proc minLen(strs: openArray[string]): int =
  result = int.high
  for s in strs: result = min(result, s.len)

proc check(strs: openArray[string]; j0, j1: int): bool =
  let srcSlc = strs[0][j0..j1]
  for s in strs:
if s[j0..j1] != srcSlc: return false
  return true

proc longestCommonPfx*(strs: openArray[string]): string=
  if strs.len < 1: return ""
  let src = strs[0]
  if strs.len < 2: return src
  var lo = 0  # Binary search
  var hi = strs.minLen - 1
  while lo <= hi:
let mid = lo + (hi - lo) div 2
if check(strs, lo, mid):  # All strs match this
  result.add src[lo .. mid]   # Append to answer
  lo = mid + 1# Go higher
else: # Go lower
  hi = mid - 1

when isMainModule:
  import os
  echo longestCommonPfx(commandLineParams())


Run

There are also a lot of variants at 
[http://rosettacode.org/wiki/Longest_common_prefix](http://rosettacode.org/wiki/Longest_common_prefix)

For paths you would just need to add something to backup to the most recent 
directory separator, if any.


Re: A path commonPrefix function

2020-01-04 Thread cblake
Are you aware of this binary search approach? 
[https://www.tutorialcup.com/string/longest-common-prefix-using-biary-search.htm](https://www.tutorialcup.com/string/longest-common-prefix-using-biary-search.htm)


Re: A path commonPrefix function

2020-01-04 Thread marks
This is the fastest and best so far: 


proc commonPrefix*(paths: openArray[string], sep = "/"): string =
  if len(paths) == 0:
return ""
  let first = paths[0]
  var index = -1
  block loop:
for i in 0 ..< len(first):
  for j in 1 .. paths.high():
let path = paths[j]
if i < len(path) and first[i] != path[i]:
  break loop
  index = i
  if index == -1:
return ""
  else:
return first[0 .. first.rfind(sep, 0, index)]


Run


Re: A path commonPrefix function

2020-01-04 Thread marks
Thanks for that. I discovered that my new version is 4x faster than the 
original..


Re: A path commonPrefix function

2020-01-04 Thread cblake
> Is there a nim equivalent to Python's timeit module for timing snippets?

There are probably about 100 variants in people's local code, but I happened to 
just see your message and also to just add this to `cligen/osUt` the other day 
(needs `requires "cligen#head"` in your `.nimble` file or you could just copy 
this tiny impl and adapt as you like): 


import strutils, times

template timeIt*(label:string, unit:float, places=3, sep="\n", body: 
untyped) =
  let t0 = epochTime()
  body
  let dt = epochTime() - t0
  stdout.write label, " ", formatFloat(dt * unit, ffDecimal, places), sep

timeIt("write1", 1e6, 3, " ") : stderr.write "hi"
timeIt("write2", 1e6, 3, "\n"): stderr.write "there"


Run

You may want to add loops to repeat and use the `stats.RunningStat` stdlib API 
to do something nice more like Python's version. There is also this package 
called golden 
([https://github.com/disruptek/golden](https://github.com/disruptek/golden)) to 
try to iterate until timing noise has been suppressed. And probably other 
various solutions, too. (And sometimes noise **is** the data and you want 
cold-cache/system competition effects included...benchmarking is obviously a 
larger topic than a snippet thing.)


Re: A path commonPrefix function

2020-01-04 Thread marks
This version walks forward char by char. Is there a nim equivalent to Python's 
`timeit` module for timing snippets? 


proc commonPrefix*(paths: openArray[string], sep = "/"): string =
  if len(paths) == 0:
return ""
  result = paths[0]
  var index = 0
  var ok = false
  for path in paths[1 .. ^1]:
while index < len(path) and index < len(result) and
result[index] == path[index]:
  inc index
  ok = true
  result = if not ok:
  ""
  else:
result[0 .. result.rfind(sep, 0, index)]


Run


Re: A path commonPrefix function

2020-01-03 Thread demotomohiro
When your commonPrefix takes 2 paths, length of both paths are n and both paths 
has no common prefix, it has O(n^2) time complexity. Because startsWith proc is 
O(n) and it is called n times.

relativePath proc in os module calculate common prefix in O(n) time complexity. 
[https://github.com/nim-lang/Nim/blob/devel/lib/pure/os.nim](https://github.com/nim-lang/Nim/blob/devel/lib/pure/os.nim)


Re: A path commonPrefix function

2020-01-03 Thread SolitudeSF
something like that 


proc commonPrefix*(paths: openArray[string], sep = "/"): string =
  if len(paths) > 0:
result = paths[0]
for i in 1 .. paths.high:
  let path = paths[i]
  while not path.startsWith(result) and len(result) > 0:
result.setLen(result.high)
result.setLen(result.rfind(sep) + 1)


Run


A path commonPrefix function

2020-01-03 Thread marks
I have the following `commonPrefix()` function (for whole paths) which works 
fine on Linux (and Windows), but wondering if it could be shorter and/or more 
efficient?


proc commonPrefix*(paths: openArray[string], sep = "/"): string =
  if len(paths) == 0:
return ""
  result = paths[0]
  for path in paths[1 .. ^1]:
while not path.startsWith(result) and len(result) > 0:
  result = result[0 .. ^2]
  result = result[0 .. result.rfind(sep)]


Run

Tests: 


test "commonPrefix":
  check(commonPrefix(["/tmp/test1", "/tmp/data.dat"]) == "/tmp/")
  check(commonPrefix(["/home/mark/test1", "/tmp/data.dat"]) == "/")
  check(commonPrefix(["/home/mark/test1", "local/data.dat"]) == "")


Run